On Tue, Dec 14, 2010 at 6:40 AM, Quincey Koziol <[email protected]> wrote:

> Hi Leigh,
>
> On Dec 9, 2010, at 11:57 AM, Leigh Orf wrote:
>
> > Thanks for the information. After I sent my email I realized I left out
> some relevant information. I am not using pHDF5 but regular HDF5, but in a
> parallel environment. The only reason I am doing this is because I want the
> ability to write compressed HDF5 files (gzip, szip, scale-offset, nbit,
> etc.). As I understand it, at this point (and maybe forever) pHDF5 cannot do
> compression.
>
>         We are working toward it, but it's going to be about a year away.
>
> > I currently have tried two approaches with compression and HDF5 in a
> parallel environment: (1) Each MPI rank writes its own compressed HDF5 file.
> (2) I create a new MPI communicator (call it subcomm) which operates on a
> sub-block of the entire domain. Each instance of subcomm (which could, for
> instance, operate on one multicore chip) does a MPI_GATHER to rank 0 of
> subcomm, and that root core does the compression and writes to disk. The
> problem with (1) is there are too many files with large simulations, the
> problem with (2) is rank 0 is operating on a lot of data and the compression
> code slows things down dramatically - rank 0 cranks away while the other
> ranks are at a barrier. So I am trying a third approach where you still have
> subcomm, but instead of doing the MPI_GATHER, each core writes, in a
> round-robin fashion, to the file created by rank 0 of subcomm. I am hoping
> that I'll get the benefits of compression (being done in parallel) and not
> suffer a huge penalty for the round-robin approach.
> >
> > If there were a way to do compressed pHDF5 I'd just do a hybrid approach
> where each subcomm root node wrote (in parallel) to its HDF5 file. In this
> case, I would presume that the computationally expensive compression
> algorithms would be parallelized efficiently. Our goal is to reduce the
> number of compressed hdf5 files. Not all the way to 1 file, but not 1 file
> per MP1 rank. We are not using OpenMP and probably will not be in the
> future.
>
>         The primary problem is the space allocation that has to happen when
> data is compressed.  This is particularly a problem when performing
> independent I/O, since the other processes aren't involved, but [eventually]
> need to know about space that was allocated.  Collective I/O is easier, but
> still will require changes to HDF5, etc.  Are you wanting to use collective
> or independent I/O for your dataset writing?
>


Quincey,

Probably a combination of both, namely, an ideal situation would be a group
of MPI ranks collectively writing one compressed HDF5 file. On Blue Waters a
100kcore run with 32 cores/MCM could therefore result in say around 3000
files, which is not unreasonable.

Maybe I'm thinking about this too simply, but couldn't you compress the data
on each MPI rank, save it in a buffer, calculate the space required, and the
write it? I don't know enough about the internal workings of hdf5 to know
whether that would fit in the HDF5 model. In our particular application on
Blue Waters, memory is cheap, so there is lots of space in memory for
buffering data.

Leigh


>        Quincey
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>



-- 
Leigh Orf
Associate Professor of Atmospheric Science
Department of Geology and Meteorology
Central Michigan University
Currently on sabbatical at the National Center for Atmospheric Research
in Boulder, CO
NCAR office phone: (303) 497-8200
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to