I think that 1.2 is a lost cause in this regard - I thought we were just looking forward on the trunk.
On 6/11/07 8:17 AM, "Brian Barrett" <[email protected]> wrote: > Yes, this is a known issue. I don't know -- are we trying to make > threads work on the 1.2 branch, or just the trunk? I had thought > just the trunk? > > Brian > > > On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: > >> I had similar problems on the trunk, which was fixed by Brian with >> r14877. >> >> Perhaps 1.2 needs something similar? >> >> Tim >> >> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: >>> Per the teleconf last week, I have started to revamp the Cisco MTT >>> infrastructure to do simplistic thread testing. Specifically, I'm >>> building the OMPI trunk and v1.2 branches with "--with-threads -- >>> enable-mpi-threads". >>> >>> I haven't switched this into my production MTT setup yet, but in the >>> first trial runs, I'm noticing a segv in the test/threads/ >>> opal_condition program. >>> >>> It seems that in the thr1 test on the v1.2 branch, when it calls >>> opal_progress() underneath the condition variable wait, at some point >>> in there current_base is getting to be NULL. Hence, the following >>> segv's because the passed in value of "base" is NULL (event.c): >>> >>> int >>> opal_event_base_loop(struct event_base *base, int flags) >>> { >>> const struct opal_eventop *evsel = base->evsel; >>> ... >>> >>> Here's the full call stack: >>> >>> #0 0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5) >>> at event.c:520 >>> #1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514 >>> #2 0x0000002a95599111 in opal_progress () at runtime/ >>> opal_progress.c: >>> 259 >>> #3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0, >>> m=0x502600) >>> at ../../opal/threads/condition.h:81 >>> #4 0x0000000000401146 in thr1_run (obj=0x503110) at >>> opal_condition.c:46 >>> #5 0x00000036e290610a in start_thread () from /lib64/tls/ >>> libpthread.so.0 >>> #6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6 >>> #7 0x0000000000000000 in ?? () >>> >>> This test seems to work fine on the trunk (at least, it didn't segv >>> in my small number of trail runs). >>> >>> Is this a known problem in the 1.2 branch? Should I skip the thread >>> testing on the 1.2 branch and concentrate on the trunk? >> _______________________________________________ >> devel mailing list >> [email protected] >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > _______________________________________________ > devel mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/devel
