One thing you might check if you suspect compiler alignment issues is
running "ompi_info --all" and see what Apple used to configure/build
OMPI. We save the CFLAGS and whatnot; they may be helpful to you...?
I see on my MBP/Leopard 10.5.1, for example:
C compiler absolute: /usr/bin/gcc
...
Build CFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions -
fno-strict-aliasing
Build CXXFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions
Build FFLAGS:
Build FCFLAGS:
Build LDFLAGS: -export-dynamic -Wl,-u,_munmap -Wl,-
multiply_defined,suppress
Build LIBS: -lutil
Wrapper extra CFLAGS:
Wrapper extra CXXFLAGS:
Wrapper extra FFLAGS:
Wrapper extra FCFLAGS:
Wrapper extra LDFLAGS: -Wl,-u,_munmap -Wl,-
multiply_defined,suppress
Wrapper extra LIBS: -lutil
I'll *guess* that the -Wl options came from OMPI's normal configure
script. But the -arch and -f might have come from Apple...?
That being said, I'm *not* sure how this information relates to the
universal binaries... It *may* be that you'll see the different
options for the different architectures depending on which machine you
run "ompi_info" on...? I don't know enough about how universal
binaries are built or run to know.
On Jan 24, 2008, at 1:12 PM, Ralph H Castain wrote:
Appreciate the clarification. I am unaware of anyone attempting that
procedure in the past, but I'm not terribly surprised to hear it would
encounter problems and/or fail. Given the myriad of configuration
options in
the code base, it would seem almost miraculous that you could either
(a) hit
the same config options used by Apple (whatever they were), or (b)
manage to
find a combination that matched enough to let you do this without
problem.
Frankly, I'm surprised even this small a fix would let you work
around the
problems... ;-)
Unless you have some overriding reason to use the shipped binaries for
everything other than this special component, you're probably going
to have
a lot more success just rebuilding from source.
But that's just an opinion - either way, good luck with your efforts!
Ralph
On 1/24/08 10:54 AM, "Dean Dauger, Ph. D." <d...@daugerresearch.com>
wrote:
I'm sorry, but now I am totally confused. Are you saying that you
are having
problems with the default rsh component in the distributed 1.2.3
code??
Yes ...
Or are you having a problem with your customized version?
and yes. Each exhibited the same problem, a bus error.
What compiler are you using? If it's your customized version, did
you make sure to change the
names of the data structures and modules as I pointed out?
gcc 4.0.1, the default of Leopard. Yes, in the customized version, I
did change the names of the data structures, subroutines, support
file names, and where it says "rsh" just like you said.
We regularly work on Macs, both PPC and Intel based (I develop and
test on
both every day), and I have -never- seen this problem in our code
base.
Hence my confusion.
I'm sorry to confuse. I'm starting with the shipping Mac OS X 10.5.1
"Leopard", which contains its own build of Open MPI (v1.2.3 according
to "orterun -version"). So I assumed that the v1.2.3 branch from
svn.open-mpi.org was the same code Apple used to build the Open MPI
that ships in Leopard.
My motivation was to build a new pls module based on pls_rsh module's
source code, substituting the rsh with my own name like you said, but
I encountered a bus error. So to be sure I didn't screw up somewhere
in my custom module I rebuilt the unmodified pls_rsh module and
discovered the same problem.
Then, after downloading the Open MPI from opensource.apple.com
(suspecting it was different), I tried recompiling the pls_rsh module
from that source code, dropped in just the resulting mca_pls_rsh.la
and mca_pls_rsh.so into the existing /usr/lib/openmpi of Leopard,
overwriting Leopard's versions, and the bus error happened the same
as before.
That's where I was with my first post to this list.
My last post regards the discovery that rearranging the elements of
orte_pls_rsh_component_t, without changing anything else about the
pls_rsh code, affects the bus error outcome. Then I padded out
orte_pls_rsh_component_t and my "orte_pls_dean_component_t" by hand
so that it would be "data alignment agnostic", if you will.
Consequently the bus error no longer occurs and both pls modules now
run as they should.
My hypothesis: Apple's procedure to build Open MPI into Leopard had a
side effect requiring shared object code structures to follow a data
alignment different than if I simply recompile Open MPI straight from
its source.
I'm not saying anyone is to blame, but I'm recognizing that those
builds have different timelines. I predict that if I overwrite all
of Leopard's Open MPI object code, then it would all run too.
For my needs, I have a sufficient workaround: realign my data
structures to be "agnostic". I'm sharing this little discovery just
in case it might help somebody else out there; for all I know it
could happen on non-Macs too.
Thanks,
Dean
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
_______________________________________________
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
--
Jeff Squyres
Cisco Systems