On Wed, Nov 9, 2011 at 5:10 PM, <[email protected]> wrote:

> MVAPICH-1.1 is based on mpich-1.2.7 (that's mpich, not mpich2: it's 7
> years old).
>
> You should probably check with valgrind just to make sure you are not
> doing anything bad with memory. Probably you are ok on that regard,
> but valgrind will tell you for sure (mpiexec -np whatever valgrind
> --log-file=myprogram.%p.vg myprogram )
>

Thanks for this tip... I'll double check this at some point, but I have
laboriously quadrupal checked the code for issues like these so i doubt it
will trun up anything, but it can't hurt to have a look.

>
> Since you are stuck with mvapich-1.1 you will have to go out of your
> way a bit to make collective writes work:
>
> - you know which processors have data and which ones do not (or else
>  you would not be able to call h5sselect_none_f).
>
> - With this information you can call MPI_COMM_SPLIT
>
> MPI_COMM_SPLIT(COMM, COLOR, KEY, NEWCOMM, IERROR)
> INTEGER COMM, COLOR, KEY, NEWCOMM, IERROR
>
> where "color" would be either "have data" or "don't have data".
>
> Processors that don't have data get to sit out this iteration.
>
> Processors that have data participate in collective I/O: instead of
> passing in MPI_COMM_WORLD, pass in the NEWCOMM from MPI_COMM_SPLIT.
>
> I supsect the "don't have data" processors change from iteration to
> iteration.  I guess you'll have to do the benchmark to see if "open,
> write collectively, close" is still faster than "write independently)
>
>
Actually, the processors that have data to write is known once the
simulation reads in the input file. The problem is, we are ouputing planar
slices of our 3D domain at high frequency, and the number of slices and
their orientation, position, extent and subsampling is specified in the
input file. So the number of planes we write might be as many as ten, and
taken together, they could involve every single MPI rank. These planes are
written every 10 to 40 iterations (restart dumps are produced typically
after at least 2000 iterations). So we could use MPI split to generate
communicators for each slice at the beginning of the program execution, but
closing and reopening the file with the new communicator O(10) times every
10-40 iterations sounds like it could get costly. We need to be able to
write all the planes in ~ 10 seconds in order to avoid serious load
balancing issues.

I think that for the time being I should just pick planes that don't cause
this error to pop up, (so far the data that we REALLY care about doesn't
cause this issue to arise... I can do collective IO on some slices even if
they have ranks which have null selections) and then at a later date work
on migrating the code base to be compatible with more modern mpich/mvapich
releases.


> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>


Izaak Beekman
===================================
(301)244-9367
Princeton University Doctoral Candidate
Mechanical and Aerospace Engineering
[email protected]

UMD-CP Visiting Graduate Student
Aerospace Engineering
[email protected]
[email protected]
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to