On Jun 11, 2007, at 8:25 AM, Jeff Squyres wrote:

I leave it to the thread subgroup to decide...  Should we discuss on
the call tomorrow?

I don't have a strong opinion; I was just testing both because it was
easy to do so.  If we want to concentrate on the trunk, I can adjust
my MTT setup.


I think trying to worry about 1.2 would just be a time sink. We know that there are architectural issues with threads in some parts of the code. I don't see us re-architecting 1.2 in this regard.
Seems we should only focus on the trunk.


- Galen



On Jun 11, 2007, at 10:17 AM, Brian Barrett wrote:

Yes, this is a known issue.  I don't know -- are we trying to make
threads work on the 1.2 branch, or just the trunk?  I had thought
just the trunk?

Brian


On Jun 11, 2007, at 8:13 AM, Tim Prins wrote:

I had similar problems on the trunk, which was fixed by Brian with
r14877.

Perhaps 1.2 needs something similar?

Tim

On Monday 11 June 2007 10:08:15 am Jeff Squyres wrote:
Per the teleconf last week, I have started to revamp the Cisco MTT
infrastructure to do simplistic thread testing.  Specifically, I'm
building the OMPI trunk and v1.2 branches with "--with-threads --
enable-mpi-threads".

I haven't switched this into my production MTT setup yet, but in the
first trial runs, I'm noticing a segv in the test/threads/
opal_condition program.

It seems that in the thr1 test on the v1.2 branch, when it calls
opal_progress() underneath the condition variable wait, at some
point
in there current_base is getting to be NULL.  Hence, the following
segv's because the passed in value of "base" is NULL (event.c):

int
opal_event_base_loop(struct event_base *base, int flags)
{
         const struct opal_eventop *evsel = base->evsel;
...

Here's the full call stack:

#0  0x0000002a955a020e in opal_event_base_loop (base=0x0, flags=5)
     at event.c:520
#1  0x0000002a955a01f9 in opal_event_loop (flags=5) at event.c:514
#2  0x0000002a95599111 in opal_progress () at runtime/
opal_progress.c:
259
#3  0x00000000004012c8 in opal_condition_wait (c=0x5025a0,
m=0x502600)
     at ../../opal/threads/condition.h:81
#4  0x0000000000401146 in thr1_run (obj=0x503110) at
opal_condition.c:46
#5  0x00000036e290610a in start_thread () from /lib64/tls/
libpthread.so.0
#6  0x00000036e1ec68c3 in clone () from /lib64/tls/libc.so.6
#7  0x0000000000000000 in ?? ()

This test seems to work fine on the trunk (at least, it didn't segv
in my small number of trail runs).

Is this a known problem in the 1.2 branch? Should I skip the thread
testing on the 1.2 branch and concentrate on the trunk?
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Jeff Squyres
Cisco Systems

_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

Reply via email to