Re: [Hdf-forum] Parallel dataset resizing strategies

Chris Green Mon, 25 Jul 2016 15:43:07 -0700

Hi,

Thanks for this. Comments inline.


On 7/22/16 12:13 PM, Nelson, Jarom wrote:

If you can move to HDF5 1.10, I would recommend independent files foreach MPI rank, and then create a master file (created independentlyperhaps by rank 0) with Virtual Datasets linking in the data from eachrank in the format you need. Virtual Datasets can be created with filematching patterns for dynamically increasing datasets, so you mightlook into using that feature.

We don't have existing tools relying on a particular version, so we arenominally free to move to HDF5 1.10.x. However, it won't be completelystraightforward because I have been relying for now on using thehomebrew version, which is currently 1.18.16. I'd have to dink therecipe to use 1.10.x, which is not a showstopper.

I found this approach much faster than creating a collective file(~5-10x speedup on a Lustre filesystem). You don’t need to do anycollective reads or writes, and I think we could even bypass usingparallel HDF5 altogether. Note, this will only work if you only everneed to open the Virtual Dataset in parallel (i.e. by more than oneprocess) as non-collective read-only. If you need to have read-writeaccess to the master file, you can’t access a Virtual Dataset usingcollective operations. You can, however, have as many processes as youlike read from a virtual dataset from a file opened as read-only.
If you have other tools that use your data but can’t move to HDF51.10, you can h5repack a file with Virtual Datasets to remove theVirtual Datasets, and it should be compatible with HDF5 1.8 (useh5repack from HDF5 1.10 patch 1 or later). This also worked well forus and I was able to load a repacked file in IDL under a 1.8 HDF5library. However h5repack is not a parallel application, so it can beslow to repack a very large file, on the order minutes per GB.

After having thought a little more about likely parallel models, I thinknow we can arrange that:


 *

   Only one rank will write to a particular dataset.

 *

   A dataset will not be read from in the same job in which it was written.

 *

   A dataset may be read by one or more ranks.

I *think* if that's the case, we could use a hierarchical multi-fileformat without resorting to virtual datasets, no? I still have somereading and experimenting to do, but if you have particular informationthat would speak to the likely success of this approach, I'd be happy tohear it.


Thanks,

Chris.

Jarom
*From:*Hdf-forum [mailto:[email protected]] *OnBehalf Of *Chris Green
*Sent:* Friday, July 22, 2016 9:32 AM
*To:* [email protected]
*Subject:* [Hdf-forum] Parallel dataset resizing strategies

Hi,
I am relatively new to HDF5 and HDF5/parallel, and although I haveexperience with MPI it is not extensive. We are exploring ways ofsaving data in parallel using HDF5 in a field in which it ispractically unknown up to now.
Our paradigm is "parallel modular event processing:"

  * A typical job processes many "events."
  * An event contains all of the interesting data (raw and processed)
    associated with some time interval.
  * Each event can be processed independently of all other events.
  * Each event's data can be subdivided into internal components,
    "data products."
  * "Modules" are processing subunits which read or generate one or
    more data products for each event.
  * One can calculate a data dependency graph specifying the allowed
    ordering and/or parallelism of modules processing one or more
    events simultaneously for a given job configuration and event
    structure.
We have been using h5py with HDF5 and OpenMPI to explore differentstrategies for parallel I/O in a future parallel event-processingframework. One of the approaches we have come up with so far is tohave one HDF5 dataset per unique data product / writer modulecombination, keeping track of the different relevant sections of eachdataset via (for now) an external database. This works well in serialtests, but in parallel tests we are running up against the constraintthat dataset resizing is a collective operation, meaning that allranks including non-writers will have to become aware of and duplicatedataset resizing operations required by other writers. The problemseems to get even worse if there's a possibility that two or moreinstances of a module would need to extend and write to the samedataset at the same time (while processing different events, say),since they will have to coordinate and agree on the new size of thedataset and their respective sections thereof.
Are we misunderstanding the problem, or is it really this hard? Hasanyone else hit upon a reasonable strategy for handling this orsomething like it?
Any pointers appreciated.

Thanks,

Chris Green.

--
Chris Green<[email protected]> <mailto:[email protected]>, FNAL CS/SCD/ADSS/SSI/TAC;
'phone (630) 840-2167; Skype: chris.h.green;
IM:[email protected] <mailto:[email protected]>, chissgreen (AIM),
chris.h.green (Google Talk).


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Chris Green <[email protected]>, FNAL CS/SCD/ADSS/SSI/TAC;
'phone (630) 840-2167; Skype: chris.h.green;
IM: [email protected], chissgreen (AIM),
chris.h.green (Google Talk).

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Parallel dataset resizing strategies

Reply via email to