The machine in question is a dual processor PII-350 at home.
I am running linux-2.2.11 with SMP enabled in a configuration
which has been great in all other respects (I don't suspect the kernel
as the source of the problem).
I built mpi-ch with the -device=ch_shmem and -arch=LINUX options.
It checked out on the test suite, computed Pi fine
with the little cpi program included in the distribution. It also
ran my own little programs that I wrote when I was teaching myself
MPI (glorified 'Hello world's).
Now I am trying to do real work with it. That is to say run an Nbody code
that I have written. It is a testbed for a parallel treecode that I
have under development for investigating various issues in cosmology and
galaxy formation. Its written in scratch in C and uses dynamic memory
allocation (which is what I am writing to ask about).
I should note that the code works fine on a mulit-processor SGI
(some other version of MPI, the details of which are out of my control)
and has in the past worked beautifully on a small 'wulf that my advisor
built for development.
What I find is that when I mpirun the program I get
Child process died unexpectedly from signal 11
examining debugging output is leading me to question if
the program might be allocating memory wrong.
There is an array "allparts" which is allocated in the master
node based on the contents of an initialization file. In the
remaining nodes memory is allocated for it using malloc.
The program is dying on the invocation of a MPI_Bcast which
duplicates the particle data on the slave nodes.
The relevant code lines look something like this:
#allparts for node 0 is initialized and filled out on the basis of a file
#we dump the allparts pointers from all nodes
if(my_rank!=0) {
allparts=malloc(NUMPARTS*sizeof(particle));
}
#we dump the allparts pointers from all nodes again for comparison
MPI_Bcast(allparts, NUMPARTS,particle_type_ptr, 0, MPI_COMM_WORLD);
#program dies
what I find when I look at these dumps is that it looks like malloc
is returning the same chunk of memory for all three slave nodes!
4 0: before allocation allparts 404cf008 8 74
^node number ^^^ this is the hex address allparts
4 1: before allocation allparts 0 8 74
4 2: before allocation allparts 0 8 74
4 3: before allocation allparts 0 8 74
5 0: allocate allparts
5 1: allocate allparts
5 2: allocate allparts
5 3: allocate allparts
6 0: after allocation allparts 404cf008 8 74
^^^ this is the hex address allparts
6 1: after allocation allparts 8081fc8 8 74
6 2: after allocation allparts 8081fc8 8 74
6 3: after allocation allparts 8081fc8 8 74
why might this be happening? Is it intentional? In a SHMEM context is
there a special library that I can link to that will allocate memory
independently for the different instantiations of the code? Is
SHMEM even the right way for me to be running MPI on an SMP linux
box?
BTW I tried building MPI with the default ch_p4 device but I get a
buffer space error when I try anything!
If anyone has any wisdom on the issue, I'd appreciate it.
Thanks!
Chris
Who are you? What do you want? Why are you here?
.... Where are you going?
------------------------ Chris Gottbrath ------------------------
http://agave.as.arizona.edu/~chrisg/ [EMAIL PROTECTED]
-
Linux SMP list: FIRST see FAQ at http://www.irisa.fr/prive/mentre/smp-faq/
To Unsubscribe: send "unsubscribe linux-smp" to [EMAIL PROTECTED]