Thanks, will do, I'll get back to you soon

-----Original Message-----
From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Friday, July 01, 2011 5:00 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] Question about hanging mpirun

It sounds like you have a deadlock in your MPI application.

You might want to attach a debugger and see where the MPI processes are stuck.


On Jul 1, 2011, at 4:49 PM, Ralph Castain wrote:

> I'm afraid there isn't enough info here to advise - I don't know which poll 
> is failing. What function is calling poll?
> 
> Could be a problem with the event library, but I don't know. Have you tried 
> using "-mca btl sm,self" instead of tcp?
> 
> 
> On Jul 1, 2011, at 2:37 PM, Colon, Joseanibal wrote:
> 
>> I got the LD_LIBRARY_PATH correct and I don't have other installations on 
>> the target machine, but it doesn't fix it. I had the suspicion about 
>> "./configure" building support for stuff on my machine that is not available 
>> on the target machine. Unfortunately the machines are not exactly identical, 
>> definitely in terms of hardware. The only similarities are the OS and the 
>> x86_64 architecture (this is OpenSUSE 11, SP1).
>> As you correctly guessed I want to run this on a single machine, and all 
>> processes are local. There is some intercommunication going on as well, but 
>> all using MPI API. I am guessing that my problem has to do with 
>> intercommunications (since strace shows infinite calls to 'poll()'), 
>> probably because mpirun is trying to use features that were configured on my 
>> machine but not present on the target. Does that make sense?
>> I figured I don't need any fancy support to just run a couple of processes 
>> in parallel locally.  What would be the most basic configuration I can use 
>> to ensure that this will run on my target machine? (a machine that probably 
>> doesn't have support for a lot of the components - no IB devices found). I 
>> want openmpi to use the simplest form available. Thanks!
>>  
>> -Joseanibal
>>  
>>  
>> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On 
>> Behalf Of Ralph Castain
>> Sent: Friday, July 01, 2011 3:50 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] Question about hanging mpirun
>>  
>> Make sure your LD_LIBRARY_PATH will pickup this installation before anything 
>> else - it's possible it is picking up an old one.
>>  
>> I take it that you are running this on a single machine? So all the procs 
>> are local?
>>  
>> Only other issue is that OMPI's configure does a lot of testing to detect 
>> the local environment. So you might be building support for things that 
>> aren't on your target machine, and vice versa. If you have to do it this 
>> way, you need to ensure that the two machines are absolutely identical, both 
>> in hardware and software (watch for those installed packages!).
>>  
>>  
>> On Jul 1, 2011, at 10:42 AM, Colon, Joseanibal wrote:
>> 
>> 
>> My mpi application is hanging forever when called with mpirun -np >1 (that 
>> is 2 or more... not actually typing the '>').
>>  
>> So I built openmpi 1.4.3 with default options except I used 
>> -prefix=/usr/local/openmpi. I compiled an application against it but I need 
>> to run this application elsewhere. So brought in my entire installation 
>> directory /usr/local/openmpi to this new machine along with my binary to 
>> test it. Ran the following command... (If i did't use the -mca options it 
>> would print out messages about missing OpenFrabric):
>> /usr/local/openmpi/bin/mpirun --mca btl tcp,self -np 2 ./my_application
>>  
>> This actually works for -np 1. But requesting another process makes the call 
>> hang forever. 'strace' of the above call shows an never ending calls to 
>> "poll" resulting in (timeout) every time.
>> Executing /usr/local/openmpi/bin/ompi_info still shows the configure and 
>> build host as the machine I built on, but I don't know if this may cause a 
>> problem. I also see "Thread support: posix (mpi: no, progress: no)"
>>  
>> Unfortunately I need to do it this way.. I cannot build openmpi on the 
>> target machine, so I need to make it portable. This other machine should be 
>> the same architecture and OS and everything.
>>  
>> I should have solved this yesterday, please help, and thanks!
>>  
>> -Joseanibal
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>  
>> _______________________________________________
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to