Re: [Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Albert Cheng Fri, 30 Aug 2013 08:22:40 -0700

HI Daniel,

You did not say what parallel file system you used in 2) and 3).  The
performance of the parallel file system is important in these cases.
E.g., if your PFS is not truly scalable, a 512 processes access
could drop the IO speed to 1/500 of 1 process access speed 
(they all compete for a write-token, for example).

Another issue--if your cluster system is a linux system, 100MB data
is way too small to make any conclusion.  Linux OS is known to
do kernel IO for as much as it can get.  E.g., if the linux system each
has 4 GB memory and is not too busy, it can use 3+GB memory for
IO.  Therefore, any writes less than 3GB is merely copying data
from user memory to kernel memory. The "IO speed" is just
memory to memory speed, not truly memory to disk speed.
One way to confirm this is to do writes at least 2 times more than
total memory of the processor. Compare the "2 times" write speed
vs the 100MB speed and you should see a big drop.

I would suggest you build and use the performance measurement
tool, perform/h5perf in the HDF5 source. h5perf measures all three
IO speeds, POSIX speed, MPIO speed, pHDF5 speed. It gives you
a better understanding of what your parallel file system can deliver.

Hope this helps.

-Albert Cheng
THG staff

On Aug 30, 2013, at 9:05 AM, Daniel Langr <[email protected]> wrote:

> I've run some benchmark, where within an MPI program, each process wrote 3 
> plain 1D arrays to 3 datasets of an HDF5 file. I've used the following 
> writing strategies:
> 
> 1) each process writes to its own file,
> 2) each process writes to the same file to its own dataset,
> 3) each process writes to the same file to a same dataset.
> 
> I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024), and I've 
> tested 2)-3) for both independent/collective options of the MPI driver. I've 
> also used 3 different clusters for measurements (all quite modern).
> 
> As a result, the running (storage) times of the same-file strategy, i.e. 2) 
> and 3), were of orders of magnitudes longer than the running times of the 
> separate-files strategy. For illustration:
> 
> cluster #1, 512 MPI processes, each process stores 100 MB of data, fixed data 
> sets:
> 
> 1) separate files: 2.73 [s]
> 2) single file, independent calls, separate data sets: 88.54[s]
> 
> cluster #2, 256 MPI processes, each process stores 100 MB of data, chunked 
> data sets (chunk size 1024):
> 
> 1) separate files: 10.40 [s]
> 2) single file, independent calls, shared data sets: 295 [s]
> 3) single file, collective calls, shared data sets: 3275 [s]
> 
> Any idea why the single-file strategy gives so poor writing performance?
> 
> Daniel
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Re: [Hdf-forum] Very poor performance of pHDF5 when using single (shared) file

Reply via email to