On 08/04/13 09:58 AM, Louis Letourneau wrote:
> But there is no rule of thumb? If the adapter can't service that many
> connections, would latency show this? So would using only latency to
> guess be ok?

Yes, that's a good indicator.

On some machines, you may get MPI_ERR_OTHER.

>
> I'm just wondering if a new tech comes out or if someone tries a special
> inter node communication, how can you "guess" if you need routing or
> not, beside doing 2 full assemblies with and without.
>

There is an option for that:


mpiexec -n 16384 -test-network-only -o Without-Routing


mpiexec -n 16384 -test-network-only -o Without-Routing -route-messages


Ideally, there would be an option called -route-messages-if-it-is-better ;-)


> Thanks
> Louis
>
> On 13-04-08 09:33 AM, Sébastien Boisvert wrote:
>> On 08/04/13 09:11 AM, Louis Letourneau wrote:
>>> When checking the NetworkTest results, at what point, at what avg.
>>> latency measure,  do you think it's better to not using routing?
>>> 100µs, 50µs?
>>>
>>
>> On a Cray XE6 or on a IBM Blue Gene/Q, you never need the software message 
>> routing.
>>
>>   From experience, you don't need software message routing on a IBM 
>> iDataPlex neither.
>>
>> For everything else, it really depends on the setup. Sometimes, it is needed 
>> because the
>> host communication adapter can not service that much communication links per 
>> node.
>>
>>> I'm just wondering if there is a rule of thumb.
>>>
>>> Thanks
>>> Louis
>>>
>>> On 13-03-25 05:44 PM, Sébastien Boisvert wrote:
>>>> On 25/03/13 12:46 PM, Ino de Bruijn wrote:
>>>>> Thanks a lot for the prompt reply!
>>>>>
>>>>>    > What is your interconnect ?
>>>>>    >
>>>>>    > Do you have Infiniband ?
>>>>>
>>>>> It is a Cray XE6 system that uses the Cray Gemini interconnect 
>>>>> technology. These are the full specs: 
>>>>> http://www.pdc.kth.se/resources/computers/lindgren/hardware
>>>>
>>>> In that case, you don't need to route your messages !
>>>>
>>>> The Cray XE6 has the best interconnect out there ! It's a 5D torus.
>>>>
>>>>>
>>>>> Is the polytope connection type good for this type of interconnect as 
>>>>> well?
>>>>>
>>>>
>>>> You should remove the -route-messages option altogether.
>>>>
>>>>
>>>> -route-messages is useful when using buggy Infiniband fabrics or TCP 
>>>> networks. If you use something like:
>>>>
>>>> * Cray XE6
>>>> * IBM Blue Gene/Q
>>>> * Intel PSM (QLogic Infiniband)
>>>> * IBM iDataPlex
>>>>
>>>>
>>>> you don't need this option because these systems are really good and 
>>>> provide low-latency any-to-any message passing.
>>>>
>>>>
>>>>> Best regards,
>>>>> Ino
>>>>>
>>>>>    > Date: Mon, 25 Mar 2013 11:24:21 -0400
>>>>>    > From: sebastien.boisver...@ulaval.ca
>>>>>    > To: denovoassembler-users@lists.sourceforge.net
>>>>>    > Subject: Re: [Denovoassembler-users] Long execution time, seems to 
>>>>> be stuck at Rank 0
>>>>>    >
>>>>>    > On 25/03/13 05:32 AM, Ino de Bruijn wrote:
>>>>>    > > Dear Sébastien Boisvert,
>>>>>    > >
>>>>>    > > I am trying to assemble a paired-end Illumina Hiseq library of 
>>>>> about 1 billion reads. I ran Ray with:
>>>>>    > >
>>>>>    > > mpiexec -n 1024 Ray \
>>>>>    > > -k \
>>>>>    > > 31 \
>>>>>    > > -i \
>>>>>    > > metassemble/assemblies/ray/pair.fastq \
>>>>>    > > -o \
>>>>>    > > metassemble/assemblies/ray/out_31 \
>>>>>    > > -read-write-checkpoints \
>>>>>    > > metassemble/assemblies/ray/out_31.cp \
>>>>>    > > -route-messages
>>>>>    >
>>>>>    > Without other arguments, -route-messages will use a de Bruijn graph 
>>>>> for routing, which is not really good.
>>>>>    >
>>>>>    > What is your interconnect ?
>>>>>    >
>>>>>    > Do you have Infiniband ?
>>>>>    >
>>>>>    >
>>>>>    > Use this instead (the polytope is the best routing engine in 
>>>>> RayPlatform):
>>>>>    >
>>>>>    > -route-messages -connection-type polytope -routing-graph-degree 62
>>>>>    >
>>>>>    > (from 
>>>>> https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt )
>>>>>    >
>>>>>    > >
>>>>>    > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try 
>>>>> higher values of k (i.e. {41..81..10}), the run
>>>>>    > > is exited by the scheduler after a day (max run time is one day). 
>>>>> If I look at the log of the stdout it seems
>>>>>    > > like only Rank 0 is doing something at the end. Here are are a 
>>>>> couple of lines from the output:
>>>>>    > >
>>>>>    >
>>>>>    > Well, you are using a de Bruijn graph for routing your messages. A 
>>>>> de Bruijn graph is theoretically cool for routing messages,
>>>>>    > but in practice it's very bad because it's not adaptative and it's 
>>>>> just a pit containing so many choke points.
>>>>>    >
>>>>>    > From the manual 
>>>>> https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt :
>>>>>    >
>>>>>    > -connection-type type
>>>>>    > Sets the connection type for routes.
>>>>>    > Accepted values are debruijn, hypercube, polytope, group, random, 
>>>>> kautz and complete. Default is debruijn.
>>>>>    >
>>>>>    >
>>>>>    >
>>>>>    > > Rank 0 is counting k-mers in sequence reads [51200001/249559758]
>>>>>    > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
>>>>>    > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 
>>>>> seconds
>>>>>    > >
>>>>>    > > This keeps going while only Rank 0 is outputting. The final 
>>>>> message says there are 30 minutes left for k=41. For 51 and 61 it is 
>>>>> around 10-20h left and for k=71 and k=81 it is about an hour again.
>>>>>    >
>>>>>    > You should definitely use the polytope. It has no choke points, and 
>>>>> routes are adaptative (i.e. messages between A and B will use several 
>>>>> paths).
>>>>>    >
>>>>>    > > Does it only use Rank 0 at this step because this step can only be 
>>>>> done by one core or is the graph that Rank 0 contains highly complex or 
>>>>> something?
>>>>>    >
>>>>>    > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph.
>>>>>    >
>>>>>    > >
>>>>>    > > If I want to continue running Ray. Can I resume the process by 
>>>>> running the same parameters, but using only one core (-n 1)? Or
>>>>>    > > should I use more cores?
>>>>>    >
>>>>>    > You have to re-launch Ray with the same command except the -o 
>>>>> parameter.
>>>>>    >
>>>>>    > Example:
>>>>>    >
>>>>>    > mpiexec -n 1024 Ray \
>>>>>    > -k \
>>>>>    > 31 \
>>>>>    > -i \
>>>>>    > metassemble/assemblies/ray/pair.fastq \
>>>>>    > -o \
>>>>>    > metassemble/assemblies/ray/out_31 \
>>>>>    > -read-write-checkpoints \
>>>>>    > metassemble/assemblies/ray/out_31.cp \
>>>>>    > -route-messages -connection-type polytope -routing-graph-degree 62
>>>>>    >
>>>>>    >
>>>>>    > > When is the checkpointing done?
>>>>>    >
>>>>>    > At each step.
>>>>>    >
>>>>>    > To see your checkpoint files:
>>>>>    >
>>>>>    > ls metassemble/assemblies/ray/out_31.cp | less
>>>>>    >
>>>>>    >
>>>>>    >
>>>>>    > >
>>>>>    > > It seems like I have to remove the output dir to resume from a 
>>>>> checkpoint. Is that correct?
>>>>>    >
>>>>>    > No. It's not necessary. You can instead provide a new output 
>>>>> directory.
>>>>>    >
>>>>>    > >
>>>>>    > > Best regards,
>>>>>    > > Ino de Bruijn
>>>>>    >
>>>>>    >
>>>>>    >
>>>>>    >
>>>>>    > 
>>>>> ------------------------------------------------------------------------------
>>>>>    > Everyone hates slow websites. So do we.
>>>>>    > Make your web apps faster with AppDynamics
>>>>>    > Download AppDynamics Lite for free today:
>>>>>    > http://p.sf.net/sfu/appdyn_d2d_mar
>>>>>    > _______________________________________________
>>>>>    > Denovoassembler-users mailing list
>>>>>    > Denovoassembler-users@lists.sourceforge.net
>>>>>    > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Own the Future-Intel® Level Up Game Demo Contest 2013
>>>> Rise to greatness in Intel's independent game demo contest.
>>>> Compete for recognition, cash, and the chance to get your game
>>>> on Steam. $5K grand prize plus 10 genre and skill prizes.
>>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
>>>> _______________________________________________
>>>> Denovoassembler-users mailing list
>>>> Denovoassembler-users@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Minimize network downtime and maximize team effectiveness.
>>> Reduce network management and security costs.Learn how to hire
>>> the most talented Cisco Certified professionals. Visit the
>>> Employer Resources Portal
>>> http://www.cisco.com/web/learning/employer_resources/index.html
>>> _______________________________________________
>>> Denovoassembler-users mailing list
>>> Denovoassembler-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Minimize network downtime and maximize team effectiveness.
>> Reduce network management and security costs.Learn how to hire
>> the most talented Cisco Certified professionals. Visit the
>> Employer Resources Portal
>> http://www.cisco.com/web/learning/employer_resources/index.html
>> _______________________________________________
>> Denovoassembler-users mailing list
>> Denovoassembler-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>
>
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire
> the most talented Cisco Certified professionals. Visit the
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Denovoassembler-users mailing list
> Denovoassembler-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>


------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to