I am looking for design pointers on how people handle extremely large
datasets, e.g. 10TB - 5000TB, using HDF. We would like to be able to do
parallel reads or writes, from multiple mpi ranks, but want these ranks to
be on different nodes, with each node able to utilize its own local disk.
That is, instead of, say 50TB of data being in a single file on a network
file system, we might want to arrange for ~3TB to be on each of 20 nodes'
local file systems. Clearly, this is not actually just one file, but many,
but I wonder if there are patterns or libraries that help with this sort of
staging, such that it can be handled by one set of tools then be relatively
invisible to the MPI based analysis programs who wish to work with it.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to