Hi Leigh, On Dec 14, 2010, at 4:52 PM, Leigh Orf wrote:
> > > On Tue, Dec 14, 2010 at 6:40 AM, Quincey Koziol <[email protected]> wrote: > Hi Leigh, > > On Dec 9, 2010, at 11:57 AM, Leigh Orf wrote: > > > Thanks for the information. After I sent my email I realized I left out > > some relevant information. I am not using pHDF5 but regular HDF5, but in a > > parallel environment. The only reason I am doing this is because I want the > > ability to write compressed HDF5 files (gzip, szip, scale-offset, nbit, > > etc.). As I understand it, at this point (and maybe forever) pHDF5 cannot > > do compression. > > We are working toward it, but it's going to be about a year away. > > > I currently have tried two approaches with compression and HDF5 in a > > parallel environment: (1) Each MPI rank writes its own compressed HDF5 > > file. (2) I create a new MPI communicator (call it subcomm) which operates > > on a sub-block of the entire domain. Each instance of subcomm (which could, > > for instance, operate on one multicore chip) does a MPI_GATHER to rank 0 of > > subcomm, and that root core does the compression and writes to disk. The > > problem with (1) is there are too many files with large simulations, the > > problem with (2) is rank 0 is operating on a lot of data and the > > compression code slows things down dramatically - rank 0 cranks away while > > the other ranks are at a barrier. So I am trying a third approach where you > > still have subcomm, but instead of doing the MPI_GATHER, each core writes, > > in a round-robin fashion, to the file created by rank 0 of subcomm. I am > > hoping that I'll get the benefits of compression (being done in parallel) > > and not suffer a huge penalty for the round-robin approach. > > > > If there were a way to do compressed pHDF5 I'd just do a hybrid approach > > where each subcomm root node wrote (in parallel) to its HDF5 file. In this > > case, I would presume that the computationally expensive compression > > algorithms would be parallelized efficiently. Our goal is to reduce the > > number of compressed hdf5 files. Not all the way to 1 file, but not 1 file > > per MP1 rank. We are not using OpenMP and probably will not be in the > > future. > > The primary problem is the space allocation that has to happen when > data is compressed. This is particularly a problem when performing > independent I/O, since the other processes aren't involved, but [eventually] > need to know about space that was allocated. Collective I/O is easier, but > still will require changes to HDF5, etc. Are you wanting to use collective > or independent I/O for your dataset writing? > > > Quincey, > > Probably a combination of both, namely, an ideal situation would be a group > of MPI ranks collectively writing one compressed HDF5 file. On Blue Waters a > 100kcore run with 32 cores/MCM could therefore result in say around 3000 > files, which is not unreasonable. > > Maybe I'm thinking about this too simply, but couldn't you compress the data > on each MPI rank, save it in a buffer, calculate the space required, and the > write it? I don't know enough about the internal workings of hdf5 to know > whether that would fit in the HDF5 model. In our particular application on > Blue Waters, memory is cheap, so there is lots of space in memory for > buffering data. What you say above is basically what happens, except that space in the file needs to be allocated for each block of compressed data. Since each block is not the same size, the HDF5 library can't pre-allocate the space or algorithmically determine how much to reserve for each process. In the case of collective I/O, at least it's theoretically possible for all the processes to communicate and work it out, but I'm not certain it's going to be solvable for independent I/O, unless we reserve some processes to either allocate space (like a "free space server") or buffer the "I/O", etc. Quincey
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
