I am looking for design pointers on how people handle extremely large datasets, e.g. 10TB - 5000TB, using HDF. We would like to be able to do parallel reads or writes, from multiple mpi ranks, but want these ranks to be on different nodes, with each node able to utilize its own local disk. That is, instead of, say 50TB of data being in a single file on a network file system, we might want to arrange for ~3TB to be on each of 20 nodes' local file systems. Clearly, this is not actually just one file, but many, but I wonder if there are patterns or libraries that help with this sort of staging, such that it can be handled by one set of tools then be relatively invisible to the MPI based analysis programs who wish to work with it.
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
