On 09/19/2013 10:43 AM, [email protected] wrote:
> What we are doing is working with The HDF Group to define a work package 
> dubbed "Virtual Datasets" where you can have a virtual dataset in a master 
> file which is composed of datasets in underlying files. It is a bit like 
> extending the soft-link mechanism to allow unions. The method of mapping the 
> underlying datasets onto the virtual dataset is very flexible and so we hope 
> it can be used in a number of circumstances. The two main requirements are:
> 
>  - The use of the virtual dataset is transparent to any program reading the 
> data later.
>  - The writing nodes can write their files independently, so don't need pHDF5.

As a matter of fact, this is pretty much what we did already for our own
research: We, too, patched the HDF5 library to provide writing of
multiple files and reading them back in a way entirely transparent to
the application. You can find our patch, along with a much more detailed
description, on our website:
http://www.wr.informatik.uni-hamburg.de/research/projects/icomex/multifilehdf5

On our system, we could actually see an improvement in wall-clock time
for the entire process of writing-reconstructing-reading as opposed to
writing to a shared file and reading it single stream. This may be
different on other systems, but at least we expect a huge benefit in
CPU-time since the multifile approach allows the parallel part of the
workflow to be fast.

Of course, we are very interested to hear about other people's
experiences with transparent multifiles.

Cheers,
Nathanael Hübbe

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to