Hi,

I think the mpiexec -48 date indicates that processes are launched on the two
nodes.

But there is probably a problem with the way messages are sent with your network.

Can you try these network tests, paste the results on http://pastebin.com/
(one paste per test) and link these in your reply ?

Each of these tests will take a few seconds to run.


Test 1

mpiexec -n 48 -hostfile hostfile.actinode34 \
--mca mca_verbose 9999999 \
--mca btl_base_verbose 9999999 \
--mca btl_openib_verbose 9999999 \
/home/krobison/packages/Ray-v2.0-ReleaseCandidate5/Ray \
-test-network-only -o NetworkTest1 &> NetworkTest1.txt


Test 2

mpiexec -n 48  -hostfile hostfile.actinode34 \
--mca mca_verbose 9999999 \
--mca btl_base_verbose 9999999 \
--mca btl_openib_verbose 9999999 \
--mca btl self,tcp \
/home/krobison/packages/Ray-v2.0-ReleaseCandidate5/Ray \
-test-network-only -o NetworkTest2 &> NetworkTest2.txt


Test 3

mpiexec -n 48  -hostfile hostfile.actinode34 \
--mca mca_verbose 9999999 \
--mca btl_base_verbose 9999999 \
--mca btl_openib_verbose 9999999 \
--mca btl self,openib \
/home/krobison/packages/Ray-v2.0-ReleaseCandidate5/Ray \
-test-network-only -o NetworkTest3 &> NetworkTest3.txt


Test 4


/sbin/ifconfig -a &> NetworkTest4.txt


Test 5

ssh actinode03 /sbin/ifconfig -a &> NetworkTest5.txt


Test 6

ssh actinode04 /sbin/ifconfig -a &> NetworkTest6.txt



                 Sébastien



Le 2012-05-15 09:40, Keith Robison a écrit :
Apologies for replying to my own message; something is amiss with my subscriptions & I didn't see Sebastien's helpful reply [I went to the archives to get it]

Sebastien: On which machine are you when launching mpirun/mpiexec ?

I'm launching the jobs from the head node (actinode)

Sebastien suggested I try pinging one node from another, which failed -- so that is a clue:

ssh actinode03 ping actinode04
ping: icmp open socket: Operation not permitted

He also suggested I try
 mpiexec -n 48 -hostfile hostfile.actinode34 date

Which prints out the date 48 times -- so that works

Sebastien also suggested I run

ompi_info -a

Which gives a lot of output


Thanks for being so helpful! I'm feeling like I don't even know the right questions to ask, so getting any direction is really a boost.

Keith R.



On Sat, May 12, 2012 at 5:58 PM, Keith Robison <keith.e.robi...@gmail.com <mailto:keith.e.robi...@gmail.com>> wrote:

    Hello!  I've run into a roadblock.

    If I run the following command in the background, the assembler
    seems to stall, with the last output being the citation for the
    assembler

    mpirun -hostfile hostfile.actinode34 -np 48 -stdin /dev/null
    /home/krobison/packages/Ray-v2.0-ReleaseCandidate5/Ray -i
    part.8.fasta -o ray.part.8.actinode34.c 1>
    ray.part.8.actinode34.c.out 2> ray.part.8.actinode34.c.err

    Where hostfile.actinode34 reads:

    actinode03 slots=24
    actinode04 slots=24


    if instead I run with a hostfile with only one host (either one of
    them) and -np 24, but otherwise the same command line, the
    assembler seems to be off and running.

    My .bashrc has

    export PATH=$PATH:/act/openmpi/gnu/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/act/openmpi/gnu/lib

    (the cluster vendor put the code in /act)
    Any suggestions for what might be triggering this behavior?



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Denovoassembler-users mailing list
Denovoassembler-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/denovoassembler-users

Reply via email to