That fixed it. Looks like something in the latest trunk has triggered this problem.

Greg

On Aug 4, 2008, at 7:58 PM, Ralph Castain wrote:

I see one difference, and it probably does lead to Terry's cited ticket. I always run -mca btl ^sm since I'm only testing functionality, not performance.

Give that a try and see if it completes. If so, then the problem probably is related to the ticket cited by Terry. Otherwise, we'll have to consider other options.

Ralph

On Aug 4, 2008, at 5:50 PM, Greg Watson wrote:

Configuring with ./configure --prefix=/usr/local/openmpi-1.3-devel --with-platform=contrib/platform/lanl/macosx-dynamic --disable-io- romio

Recompiling the app, then running with mpirun -np 5 ./shallow

All processes show R+ as their status. If I attach gdb to a worker I get the following stack trace:

(gdb) where
#0  0x9045e58a in swtch_pri ()
#1  0x904ccbc1 in sched_yield ()
#2  0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3  0x004bb0bc in opal_condition_wait ()
#4  0x004bca5c in ompi_request_wait_completion ()
#5  0x004bc92a in mca_pml_ob1_send ()
#6  0x003cdcab in MPI_Send ()
#7 0x0000453f in send_updated_ds (res_type=0x5040, jstart=8, jend=11, ds=0xbfff85b0, indx=57, master_id=0) at worker.c:214
#8  0x0000444d in worker () at worker.c:185
#9  0x00002e0b in main (argc=1, argv=0xbffff0b8) at main.c:90

The master process shows:

(gdb) where
#0  0x9045e58a in swtch_pri ()
#1  0x904ccbc1 in sched_yield ()
#2  0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3  0x004ba8bb in opal_condition_wait ()
#4  0x004ba6e4 in ompi_request_wait_completion ()
#5  0x004ba589 in mca_pml_ob1_recv ()
#6  0x003c80aa in MPI_Recv ()
#7 0x0000354c in update_global_ds (res_type=0x5040, indx=57, ds=0xbfffd068) at main.c:257
#8  0x00003334 in main (argc=1, argv=0xbffff0b8) at main.c:195

Seems to be stuck in communication.

Greg

On Aug 4, 2008, at 6:12 PM, Ralph Castain wrote:

Can you tell us how you are configuring and your command line? As I said, I'm having no problem running your code on my Mac w/10.5, both PowerPC and Intel.

Ralph

On Aug 4, 2008, at 3:10 PM, Greg Watson wrote:

Yes the application does sends/receives. No, it doesn't seem to be getting past MPI_Init.

I've reinstalled from a completely new 1.3 branch. Still hangs.

Greg

On Aug 4, 2008, at 4:45 PM, Terry Dontje wrote:

Are you doing any communications? Have you gotten past MPI_Init? Could
your issue be related to the following ticket?

https://svn.open-mpi.org/trac/ompi/ticket/1378


--td
Greg Watson wrote:
I'm seeing the same behavior on trunk as 1.3. The program just hangs.

Greg

On Aug 4, 2008, at 2:25 PM, Ralph Castain wrote:

Well, I unfortunately cannot test this right now Greg - the 1.3
branch won't build due to a problem with the man page installation
script. The fix is in the trunk, but hasn't migrated across yet.

:-//

My guess is that you are caught on some stage where the hanging bugs hadn't been fixed, but you cannot update to the current head of the 1.3 branch as it won't compile. All I can suggest is shifting to the trunk (which definitely works) for now as the man page fix should
migrate soon.

Ralph

On Aug 4, 2008, at 12:12 PM, Ralph Castain wrote:

Depending upon the r-level, there was a problem for awhile with the system hanging that was caused by a couple of completely unrelated issues. I believe these have been fixed now - at least, it is fixed on the trunk for me under that same system. I'll check 1.3 now - it
could be that some commits are missing over there.


On Aug 4, 2008, at 12:06 PM, Greg Watson wrote:

I have a fairly simple test program that runs fine under 1.2 on MacOS X 10.5 . When I recompile and run it under 1.3 (head of 1.3
branch) it just hangs.

They are both built using
--with-platform=contrib/platform/lanl/macosx-dynamic. For 1.3, I've
added --disable-io-romio.

Any suggestions?

Greg
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to