On 09/01/2014 08:26 AM, houssen wrote:
After a lot of testing, my understanding is that :
1. if the data (that belong to each MPI proc.) must be interleaved in the file, 
then P-HDF5 (and MPI-IO) can reduce significantly the elapsed time spent for IO
2. if not (independent data written independently by each MPI proc.), then 
P-HDF5 / MPI-IO / sequential approaches areequivalent


A posteriori, this seems logical to me. Are there other situations where HDF5 
may improve the IO speed-up (reduce elapsed time) ?

Yes. Consider a system like Blue Gene, with very many MPI processes and not very many I/O servers.

Collective I/O (P-HDF5) will give you two additional benefits, even if the data is non-interleaved:

- coalescing requests down to a subset of processes (the i/o aggregators). Instead of a quarter million MPI clients hitting the file sytem, maybe that is reduced to a thousand, say.

- some file-system aware optimization. For GPFS, writes should be aligned to the file system block boundary. For Lustre, writes should be done in a group-cyclic distribution so that an MPI I/O aggregator only ever speaks to one I/O server (but remember there are going to be on the order of a hundred or a thousand of these aggregators, so overall there is parallelism and improved observed bandwidth).

==rob


Franck

Le 2014-08-08 17:26, Rob Latham a écrit :
On 08/08/2014 03:27 AM, houssen wrote:
In short : are there things to know / make sure of / be aware of to get
good performance with P-HDF5 ?

- turn on collective I/O.  it's not enabled by default

- HDF5 metadata might be a factor if you have very many small
datasets, but for most applications it's not important

- consult your MPI library for any file-system specific tuning you
might be able to do.  For example, Intel-MPI needs you to set an
environment variable before it will use any of the GPFS or Panasas
optimizations it has written.

- be mindful of type conversions:  if your data in memory is a 4-byte
float, but they are 8-byte doubles on disk, HDF5 will "break
collective" and do that I/O independently.


To test this I wrote a MPI code. ... I expected to get better
performance with MPI-IO and P-HDF5 than with the sequential approach.
The spirit of this test code is very simple / basic (each MPI process
writes his own block of data in the same file, or, in separate files in
the sequential approach).

Note : in each case (sequential, MPI-IO, P-HDF5), when I say "write data
in file", I mean writing big blocks / bunch of data at once (I do not
write data one by one - I write the biggest block of data, but smaller
than 2Gb, that is possible to write).
Note : I tried with N = 1, 2, 4, 8, 16.

in 2014, 16 is not very parallel.  serial I/O has many benefits at
modest levels of parallelism: caching, mostly.

Note : I generated files (MPI-IO, P-HDF5) whose size scaled from 1Gb to
16 Gb (which looks like a "very big" file to me).

that's adequate, yes

Note : I followed the P-HDF5 documentation (use H5P_FILE_ACCESS and
H5P_DATASET_XFER property list + use hyperslab "by chunks")
Note : the file system is "GPFS" (it has been installed by the cluster
vendor : this is supposed to be ready to get performance out of P-HDF5 -
I am an "application" guy that try to use HDF5, I am not an "admin sys"
that would be familiar with complex related stuffs related to the file
system)

Now we are getting somewhere.

Note : I compiled the HDF5 package like this "./configure
--enable-parallel".
Note : I use CentOS + GNU compilers (for both HDF5 package and my test
code) + hdf5-1.8.13
Note : I use mpic++ (not h5pxx compilers - actually I didn't get why
HDF5 provides compilers) to compile my test code, is this a problem ?

just makes it easier to pick up any libraries needed.  I don't use
the wrappers, either, which means sometimes I need to figure out what
new library (like -ldl) HDF5 needs.

Any relevant clue / information would be appreciated. If what I observe
is logical I would just understand why, and, how / when it is possible
to get performance out of P-HDF5. I just would like to get some logic
out of this.

If you are using GPFS, there is one optimization that goes a long way
towards improving performance: aligning writes to file system block
boundaries.  See this email from a few weeks ago:

http://mail.lists.hdfgroup.org/pipermail/hdf-forum_lists.hdfgroup.org/2014-July/007963.html

==rob


Thanks for help,

FH

PS : I can give more information and the code, if needed (?)


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5




_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to