On Oct 23, 2007, at 6:33 AM, Bogdan Costescu wrote:

There is in the openib BTL.

The bug #1025 has in one the answers the following phrase:

"It looks like this will affect many threading issues with the
pathscale compiler -- the openib BTL is simply the first place we
tripped it."

which along with the rest of the data (failure dependency on TLS
usage) led me to wonder about threading issues.

FWIW, these problems even affect non-threaded builds, so I'm not entirely sure what the problem is. All indications point to a problem in the Pathscale compiler, but who knows -- perhaps we're doing something stupid that doesn't show up in any other compiler.

To be honest, I removed the pathscale suite from my regular
regression testing

So, is anyone else testing PathScale 3.0 with stable versions of Open
MPI ? Or with development versions ?

I don't know; Cisco is not. I removed it from my normal testing set because all IB testing would fail -- so it wasn't worth testing.

I just recompiled the OMPI 1.2 branch with pathscale 3.0 on RHEL4U4
and I do not see the problems that you are seeing.  :-\  Is Debian
etch a supported pathscale platform?

Seems like it's not... And indeed the older RHEL4 is a supported
platform, which might explain the different results.

You might want to ask them if Debian etch is supported.

I made some progress: if I configure with "--without-memory-manager"
(along with all other options that I mentioned before), then it works.
This was inspired by the fact that the segmentation fault occured in
ptmalloc2. I have previously tried to remove the MX support without
any effect; with ptmalloc2 out of the picture I have had test runs
over MX and TCP without problems.

This is ringing a [very] distant bell in my memory, but I don't remember the details. Brian: do you remember any specific issues about the memory manager and pathscale compiler?

--
Jeff Squyres
Cisco Systems

Reply via email to