On Thu, Sep 19, 2013 at 08:43:48AM +0000, [email protected] wrote:
> I have been following this thread with interest since we have the same issue 
> in the synchrotron community, with new detectors generating 100's-1000's of 
> 2D frames/sec and total rates approaching 10 GB/sec using multiple parallel 
> 10 GbE streams from different detector nodes. What we have found is:
> 
>  - Lustre is better at managing the pHDF5 contention between nodes than GPFS 
> is.
>  - GPFS is better at streaming data from one node, if there is no contention.
>  - Having the nodes write to separate files is better than using pHDF5 to 
> enable all nodes to write to one.

I would wager a tasty beverage or a box of donuts that the reason you
seen poor performance with GPFS to a shared file is because your
writes are not aligned to file system block boundaries.  On large HPC
systems, the MPI-IO layer will often take care of that file system
block boundary alignment for you -- *if* you turn on collective I/O.  

If you are using independent POSIX i/o then there won't be much HDF5
or MPI-IO can do to help you out.

> What we are doing is working with The HDF Group to define a work package 
> dubbed "Virtual Datasets" where you can have a virtual dataset in a master 
> file which is composed of datasets in underlying files. It is a bit like 
> extending the soft-link mechanism to allow unions. The method of mapping the 
> underlying datasets onto the virtual dataset is very flexible and so we hope 
> it can be used in a number of circumstances. The two main requirements are:
> 
>  - The use of the virtual dataset is transparent to any program reading the 
> data later.
>  - The writing nodes can write their files independently, so don't need pHDF5.
> 
> An additional benefit is the data can be compressed, so data rates may be 
> able to be reduced drastically by compression, depending on your situation.

You're proposing something akin to ADIOS, except the interface
continues to be the community-standard HDF5.   how interesting!

This approach will make it impossible to benefit from several
collective MPI-I/O optimizations, but it does open the door to another
family of optimizations (one would likely trawl the many ADIOS
publications for ideas).

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to