I would second this - thread safety should be a 1.3 item, unless someone has a lot of spare time.
Rich -----Original Message----- From: devel-boun...@open-mpi.org <devel-boun...@open-mpi.org> To: Open MPI Developers <de...@open-mpi.org> Sent: Mon Jun 11 10:44:33 2007 Subject: Re: [OMPI devel] threaded builds On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote: > I leave it to the thread subgroup to decide... Should we discuss on > the call tomorrow? > > I don't have a strong opinion; I was just testing both because it was > easy to do so. If we want to concentrate on the trunk, I can adjust > my MTT setup. > I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard. Seems we should only focus on the trunk. - Galen > > On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote: > >> Yes, this is a known issue. I don't know -- are we trying to make >> threads work on the 1.2 branch, or just the trunk? I had thought >> just the trunk? >> >> Brian >> >> >> On Jun 11, 2007, at 8:13 AM, Tim Prins wrote: >> >>> I had similar problems on the trunk, which was fixed by Brian with >>> r14877. >>> >>> Perhaps 1.2 needs something similar? >>> >>> Tim >>> >>> On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote: >>>> Per the teleconf last week, I have started to revamp the Cisco MTT >>>> infrastructure to do simplistic thread testing. Specifically, I'm >>>> building the OMPI trunk and v1.2 branches with "--with-threads -- >>>> enable-mpi-threads". >>>> >>>> I haven't switched this into my production MTT setup yet, but in >>>> the >>>> first trial runs, I'm noticing a segv in the test/threads/ >>>> opal_condition program. >>>> >>>> It seems that in the thr1 test on the v1.2 branch, when it calls >>>> opal_progress() underneath the condition variable wait, at some >>>> point >>>> in there current_base is getting to be NULL. Hence, the following >>>> segv's because the passed in value of "base" is NULL (event.c): >>>> >>>> int >>>> opal_event_base_loop(struct event_base *base, int flags) >>>> { >>>> const struct opal_eventop *evsel = base->evsel; >>>> ... >>>> >>>> Here's the full call stack: >>>> >>>> #0 0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5) >>>> at event.c:520 >>>> #1 0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514 >>>> #2 0x0000002a95599111 in opal_progress () at runtime/ >>>> opal_progress.c: >>>> 259 >>>> #3 0x00000000004012c8 in opal_condition_wait (c=0x5025a0, >>>> m=0x502600) >>>> at ../../opal/threads/condition.h:81 >>>> #4 0x0000000000401146 in thr1_run (obj=0x503110) at >>>> opal_condition.c:46 >>>> #5 0x00000036e290610a in start_thread () from /lib64/tls/ >>>> libpthread.so.0 >>>> #6 0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6 >>>> #7 0x0000000000000000 in ?? () >>>> >>>> This test seems to work fine on the trunk (at least, it didn't segv >>>> in my small number of trail runs). >>>> >>>> Is this a known problem in the 1.2 branch? Should I skip the >>>> thread >>>> testing on the 1.2 branch and concentrate on the trunk? >>> _______________________________________________ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> _______________________________________________ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > -- > Jeff Squyres > Cisco Systems > > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel _______________________________________________ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel