On Aug 17, 2005, at 8:23 AM, Sridhar Chirravuri wrote:

Can someone reply to my mail please?

I think you sent your first mail at 6:48am in my time zone (that is 4:48am Los Alamos time -- I strongly doubt that they are at work yet...); I'm still processing my mail from last night and am just now seeing your mail.

Global software development is challenging.  :-)

I checked out the latest code drop r6911 today morning and ran Pallas
with in the same node (2 procs). It ran fine. I didn't see any hangs
this time whereas I could see the following statements in the pallas
output and I feel they are just warnings, which can be ignored. Am I
correct?

Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
Request for 0 bytes (coll_basic_reduce.c, 194)
Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
Request for 0 bytes (coll_basic_reduce.c, 194)
Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
Request for 0 bytes (coll_basic_reduce.c, 194)

Hum. I was under the impression that George had fixed these, but I get the same warnings. I'll have a look...

Here is the output of sample MPI program which sends a char and recvs a
char.

[root@micrompi-1 ~]# mpirun -np 2 ./a.out
Could not join a running, existing universe
Establishing a new one named: default-universe-12913
[0,0,0] mca_oob_tcp_init: calling orte_gpr.subscribe
[0,0,0] mca_oob_tcp_init: calling orte_gpr.put(orte-job-0)
[snipped]
[0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2
[0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2

This seems to be a *lot* of debugging output -- did you enable that on purpose? I don't get the majority of that output when I run a hello world or a ring MPI program (I only get the bit about the existing universe).

My configure command looks like

./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
--enable-mca-no-build=btl-openib,pml-teg,pml-uniq

Since I am working with mvapi component, I disabled openib.

Note that you can disable these things at run-time; you don't have to disable it at configure time. I only mention this for completeness -- either way, it's disabled.

But I could see that data is going over TCP/GigE and not on Infiniband.

Tim: what's the status of multi-rail stuff? I thought I saw a commit recently where the TCP BTL would automatically disable itself if it saw that one or more of the low-latency BTLs was available...?

Sridhar: Did you try running explicitly requesting mvapi? Perhaps something like:

        mpirun --mca btl mvapi,self ....

This shouldn't be necessary -- mvapi should select itself automatically -- but perhaps something is going wrong with the mvapi selection sequence...? Tim/Galen -- got any insight here?

I have run pallas, it simply hangs again :-(

I'm confused -- above, you said that you ran pallas and it worked fine...?

(it does not hang for me when I run with teg or ob1)

Note: I added pml=ob1 in the conf file
/openmpi/etc/openmpi-mca-params.conf

Any latest options being added to the configure command? Please let me
know.

No, nothing changed there AFAIK.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/

Reply via email to