Thanks, will do, I'll get back to you soon
-----Original Message----- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres Sent: Friday, July 01, 2011 5:00 PM To: Open MPI Developers Subject: Re: [OMPI devel] Question about hanging mpirun It sounds like you have a deadlock in your MPI application. You might want to attach a debugger and see where the MPI processes are stuck. On Jul 1, 2011, at 4:49 PM, Ralph Castain wrote: > I'm afraid there isn't enough info here to advise - I don't know which poll > is failing. What function is calling poll? > > Could be a problem with the event library, but I don't know. Have you tried > using "-mca btl sm,self" instead of tcp? > > > On Jul 1, 2011, at 2:37 PM, Colon, Joseanibal wrote: > >> I got the LD_LIBRARY_PATH correct and I don't have other installations on >> the target machine, but it doesn't fix it. I had the suspicion about >> "./configure" building support for stuff on my machine that is not available >> on the target machine. Unfortunately the machines are not exactly identical, >> definitely in terms of hardware. The only similarities are the OS and the >> x86_64 architecture (this is OpenSUSE 11, SP1). >> As you correctly guessed I want to run this on a single machine, and all >> processes are local. There is some intercommunication going on as well, but >> all using MPI API. I am guessing that my problem has to do with >> intercommunications (since strace shows infinite calls to 'poll()'), >> probably because mpirun is trying to use features that were configured on my >> machine but not present on the target. Does that make sense? >> I figured I don't need any fancy support to just run a couple of processes >> in parallel locally. What would be the most basic configuration I can use >> to ensure that this will run on my target machine? (a machine that probably >> doesn't have support for a lot of the components - no IB devices found). I >> want openmpi to use the simplest form available. Thanks! >> >> -Joseanibal >> >> >> From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On >> Behalf Of Ralph Castain >> Sent: Friday, July 01, 2011 3:50 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] Question about hanging mpirun >> >> Make sure your LD_LIBRARY_PATH will pickup this installation before anything >> else - it's possible it is picking up an old one. >> >> I take it that you are running this on a single machine? So all the procs >> are local? >> >> Only other issue is that OMPI's configure does a lot of testing to detect >> the local environment. So you might be building support for things that >> aren't on your target machine, and vice versa. If you have to do it this >> way, you need to ensure that the two machines are absolutely identical, both >> in hardware and software (watch for those installed packages!). >> >> >> On Jul 1, 2011, at 10:42 AM, Colon, Joseanibal wrote: >> >> >> My mpi application is hanging forever when called with mpirun -np >1 (that >> is 2 or more... not actually typing the '>'). >> >> So I built openmpi 1.4.3 with default options except I used >> -prefix=/usr/local/openmpi. I compiled an application against it but I need >> to run this application elsewhere. So brought in my entire installation >> directory /usr/local/openmpi to this new machine along with my binary to >> test it. Ran the following command... (If i did't use the -mca options it >> would print out messages about missing OpenFrabric): >> /usr/local/openmpi/bin/mpirun --mca btl tcp,self -np 2 ./my_application >> >> This actually works for -np 1. But requesting another process makes the call >> hang forever. 'strace' of the above call shows an never ending calls to >> "poll" resulting in (timeout) every time. >> Executing /usr/local/openmpi/bin/ompi_info still shows the configure and >> build host as the machine I built on, but I don't know if this may cause a >> problem. I also see "Thread support: posix (mpi: no, progress: no)" >> >> Unfortunately I need to do it this way.. I cannot build openmpi on the >> target machine, so I need to make it portable. This other machine should be >> the same architecture and OS and everything. >> >> I should have solved this yesterday, please help, and thanks! >> >> -Joseanibal >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/ _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel