But there is no rule of thumb? If the adapter can't service that many
connections, would latency show this? So would using only latency to
guess be ok?

I'm just wondering if a new tech comes out or if someone tries a special
inter node communication, how can you "guess" if you need routing or
not, beside doing 2 full assemblies with and without.

Thanks
Louis

On 13-04-08 09:33 AM, Sébastien Boisvert wrote:
> On 08/04/13 09:11 AM, Louis Letourneau wrote:
>> When checking the NetworkTest results, at what point, at what avg.
>> latency measure,  do you think it's better to not using routing?
>> 100µs, 50µs?
>>
> 
> On a Cray XE6 or on a IBM Blue Gene/Q, you never need the software message 
> routing.
> 
>  From experience, you don't need software message routing on a IBM iDataPlex 
> neither.
> 
> For everything else, it really depends on the setup. Sometimes, it is needed 
> because the
> host communication adapter can not service that much communication links per 
> node.
> 
>> I'm just wondering if there is a rule of thumb.
>>
>> Thanks
>> Louis
>>
>> On 13-03-25 05:44 PM, Sébastien Boisvert wrote:
>>> On 25/03/13 12:46 PM, Ino de Bruijn wrote:
>>>> Thanks a lot for the prompt reply!
>>>>
>>>>   > What is your interconnect ?
>>>>   >
>>>>   > Do you have Infiniband ?
>>>>
>>>> It is a Cray XE6 system that uses the Cray Gemini interconnect technology. 
>>>> These are the full specs: 
>>>> http://www.pdc.kth.se/resources/computers/lindgren/hardware
>>>
>>> In that case, you don't need to route your messages !
>>>
>>> The Cray XE6 has the best interconnect out there ! It's a 5D torus.
>>>
>>>>
>>>> Is the polytope connection type good for this type of interconnect as well?
>>>>
>>>
>>> You should remove the -route-messages option altogether.
>>>
>>>
>>> -route-messages is useful when using buggy Infiniband fabrics or TCP 
>>> networks. If you use something like:
>>>
>>> * Cray XE6
>>> * IBM Blue Gene/Q
>>> * Intel PSM (QLogic Infiniband)
>>> * IBM iDataPlex
>>>
>>>
>>> you don't need this option because these systems are really good and 
>>> provide low-latency any-to-any message passing.
>>>
>>>
>>>> Best regards,
>>>> Ino
>>>>
>>>>   > Date: Mon, 25 Mar 2013 11:24:21 -0400
>>>>   > From: sebastien.boisver...@ulaval.ca
>>>>   > To: denovoassembler-users@lists.sourceforge.net
>>>>   > Subject: Re: [Denovoassembler-users] Long execution time, seems to be 
>>>> stuck at Rank 0
>>>>   >
>>>>   > On 25/03/13 05:32 AM, Ino de Bruijn wrote:
>>>>   > > Dear Sébastien Boisvert,
>>>>   > >
>>>>   > > I am trying to assemble a paired-end Illumina Hiseq library of about 
>>>> 1 billion reads. I ran Ray with:
>>>>   > >
>>>>   > > mpiexec -n 1024 Ray \
>>>>   > > -k \
>>>>   > > 31 \
>>>>   > > -i \
>>>>   > > metassemble/assemblies/ray/pair.fastq \
>>>>   > > -o \
>>>>   > > metassemble/assemblies/ray/out_31 \
>>>>   > > -read-write-checkpoints \
>>>>   > > metassemble/assemblies/ray/out_31.cp \
>>>>   > > -route-messages
>>>>   >
>>>>   > Without other arguments, -route-messages will use a de Bruijn graph 
>>>> for routing, which is not really good.
>>>>   >
>>>>   > What is your interconnect ?
>>>>   >
>>>>   > Do you have Infiniband ?
>>>>   >
>>>>   >
>>>>   > Use this instead (the polytope is the best routing engine in 
>>>> RayPlatform):
>>>>   >
>>>>   > -route-messages -connection-type polytope -routing-graph-degree 62
>>>>   >
>>>>   > (from 
>>>> https://github.com/sebhtml/ray/blob/master/Documentation/Routing.txt )
>>>>   >
>>>>   > >
>>>>   > > For k=31 the assembly succeeds in ~9h on 1,024 cores. If I try 
>>>> higher values of k (i.e. {41..81..10}), the run
>>>>   > > is exited by the scheduler after a day (max run time is one day). If 
>>>> I look at the log of the stdout it seems
>>>>   > > like only Rank 0 is doing something at the end. Here are are a 
>>>> couple of lines from the output:
>>>>   > >
>>>>   >
>>>>   > Well, you are using a de Bruijn graph for routing your messages. A de 
>>>> Bruijn graph is theoretically cool for routing messages,
>>>>   > but in practice it's very bad because it's not adaptative and it's 
>>>> just a pit containing so many choke points.
>>>>   >
>>>>   > From the manual 
>>>> https://github.com/sebhtml/ray/blob/master/MANUAL_PAGE.txt :
>>>>   >
>>>>   > -connection-type type
>>>>   > Sets the connection type for routes.
>>>>   > Accepted values are debruijn, hypercube, polytope, group, random, 
>>>> kautz and complete. Default is debruijn.
>>>>   >
>>>>   >
>>>>   >
>>>>   > > Rank 0 is counting k-mers in sequence reads [51200001/249559758]
>>>>   > > Speed RAY_SLAVE_MODE_ADD_VERTICES 4909 units/second
>>>>   > > Estimated remaining time for this step: 11 hours, 13 minutes, 27 
>>>> seconds
>>>>   > >
>>>>   > > This keeps going while only Rank 0 is outputting. The final message 
>>>> says there are 30 minutes left for k=41. For 51 and 61 it is around 10-20h 
>>>> left and for k=71 and k=81 it is about an hour again.
>>>>   >
>>>>   > You should definitely use the polytope. It has no choke points, and 
>>>> routes are adaptative (i.e. messages between A and B will use several 
>>>> paths).
>>>>   >
>>>>   > > Does it only use Rank 0 at this step because this step can only be 
>>>> done by one core or is the graph that Rank 0 contains highly complex or 
>>>> something?
>>>>   >
>>>>   > At 1024 MPI ranks, rank 0 is one of the hubs in a de Bruijn graph.
>>>>   >
>>>>   > >
>>>>   > > If I want to continue running Ray. Can I resume the process by 
>>>> running the same parameters, but using only one core (-n 1)? Or
>>>>   > > should I use more cores?
>>>>   >
>>>>   > You have to re-launch Ray with the same command except the -o 
>>>> parameter.
>>>>   >
>>>>   > Example:
>>>>   >
>>>>   > mpiexec -n 1024 Ray \
>>>>   > -k \
>>>>   > 31 \
>>>>   > -i \
>>>>   > metassemble/assemblies/ray/pair.fastq \
>>>>   > -o \
>>>>   > metassemble/assemblies/ray/out_31 \
>>>>   > -read-write-checkpoints \
>>>>   > metassemble/assemblies/ray/out_31.cp \
>>>>   > -route-messages -connection-type polytope -routing-graph-degree 62
>>>>   >
>>>>   >
>>>>   > > When is the checkpointing done?
>>>>   >
>>>>   > At each step.
>>>>   >
>>>>   > To see your checkpoint files:
>>>>   >
>>>>   > ls metassemble/assemblies/ray/out_31.cp | less
>>>>   >
>>>>   >
>>>>   >
>>>>   > >
>>>>   > > It seems like I have to remove the output dir to resume from a 
>>>> checkpoint. Is that correct?
>>>>   >
>>>>   > No. It's not necessary. You can instead provide a new output directory.
>>>>   >
>>>>   > >
>>>>   > > Best regards,
>>>>   > > Ino de Bruijn
>>>>   >
>>>>   >
>>>>   >
>>>>   >
>>>>   > 
>>>> ------------------------------------------------------------------------------
>>>>   > Everyone hates slow websites. So do we.
>>>>   > Make your web apps faster with AppDynamics
>>>>   > Download AppDynamics Lite for free today:
>>>>   > http://p.sf.net/sfu/appdyn_d2d_mar
>>>>   > _______________________________________________
>>>>   > Denovoassembler-users mailing list
>>>>   > Denovoassembler-users@lists.sourceforge.net
>>>>   > https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Own the Future-Intel® Level Up Game Demo Contest 2013
>>> Rise to greatness in Intel's independent game demo contest.
>>> Compete for recognition, cash, and the chance to get your game
>>> on Steam. $5K grand prize plus 10 genre and skill prizes.
>>> Submit your demo by 6/6/13. http://p.sf.net/sfu/intel_levelupd2d
>>> _______________________________________________
>>> Denovoassembler-users mailing list
>>> Denovoassembler-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>>
>>
>> ------------------------------------------------------------------------------
>> Minimize network downtime and maximize team effectiveness.
>> Reduce network management and security costs.Learn how to hire
>> the most talented Cisco Certified professionals. Visit the
>> Employer Resources Portal
>> http://www.cisco.com/web/learning/employer_resources/index.html
>> _______________________________________________
>> Denovoassembler-users mailing list
>> Denovoassembler-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
>>
> 
> 
> ------------------------------------------------------------------------------
> Minimize network downtime and maximize team effectiveness.
> Reduce network management and security costs.Learn how to hire 
> the most talented Cisco Certified professionals. Visit the 
> Employer Resources Portal
> http://www.cisco.com/web/learning/employer_resources/index.html
> _______________________________________________
> Denovoassembler-users mailing list
> Denovoassembler-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/denovoassembler-users
> 

------------------------------------------------------------------------------
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to