Re: [OMPI devel] SM BTL hang issue

Terry D. Dontje Fri, 31 Aug 2007 13:35:51 -0400

Ok, I have an update to this issue. I believe there is animplementation difference of sched_yield between Linux and Solaris. IfI change the sched_yield in opal_progress to be a usleep(500) then myprogram completes quite quickly. I have sent a few questions to aSolaris engineer and hopefully will get some useful information.

That being said, CT-6's implementation also used yield calls (note thisactually is what sched_yield reduces down to in Solaris) and we did notsee the same degradation issue as with Open MPI. I believe the reasonis because CT-6's SM implementation is not calling CT-6's version ofprogress recursively and forcing all the unexpected to be read in beforecontinuing. CT-6 also has a natural flow control in it's implementation(ie it has a fixed set fifo for eager messages.

I believe both of these characteristics lend CT-6 to not beingcompletely killed by the yield differences.


--td


Li-Ta Lo wrote:

On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote:

Li-Ta Lo wrote:

On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:

Li-Ta Lo wrote:
On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
hmmm, interesting since my version doesn't abort at all.
Some problem with fortran compiler/language binding? My C translationdoesn't have any problem.
[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.000000, #of msgs: 50331, usec per msg:
198.684707
Did you oversubscribe? I found np=10 on a 8 core system clogged thingsup sufficiently.

Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).

Is this using Linux?



Yes.

Ollie


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Re: [OMPI devel] SM BTL hang issue

Reply via email to