Hi Daniel,


>

> haven't received e-mails with your replies :( so I cannot reply to them

> specifically (or do not know how). So, replying to my original...

>





I'm not sure what you mean; I saw my email go through to the Forum list but you 
could not reply to it  -:?





>

> I understand the notice about memory-to-kernel writes. However, again, I

> am comparing the single/multiple-file strategy. Both give quite different

> results. Moreover, the multiple-file case correspond, within my

> measurements, to the listed peak I/O bandwidth of the storage systems. The

> single-file case is much much worse. Obviously, this is not limited by the

> memory-kernel copying.


As long as the same amount of data from each process is written in the 3 cases 
you mentioned (which is the case here as I understood), I do not suspect that 
the in kernel mem copy is causing the huge gap in performance.



>

> Thanks for the link to hd5perf, I will try it.

>

> @Mohamad:

>

> Thanks for hint, all file systems are Lustre-based, indeed with default strip

> count 1. I will rerun my measurements with different strip size/count and

> post the results.





Yes with a stripe count of 1; that will definitely slow you down because of 
Locking/contention issues; so it's not a fair comparison against the multiple 
file case, where each file might/will get a different OSS.

Increasing the stripe count/size should get you much better performance (I bet).



I don't know what machine you are using, but usually every setup has 
recommended stripe size/count for the file size of your application. Those 
usually get thrown somewhere on the user-guide website for using that machine, 
if it exists.



Thanks,

Mohamad





>

> Daniel

>

>

>

> Dne 30. 8. 2013 16:05, Daniel Langr napsal(a):

> > I've run some benchmark, where within an MPI program, each process

> > wrote

> > 3 plain 1D arrays to 3 datasets of an HDF5 file. I've used the

> > following writing strategies:

> >

> > 1) each process writes to its own file,

> > 2) each process writes to the same file to its own dataset,

> > 3) each process writes to the same file to a same dataset.

> >

> > I've tested 1)-3) for both fixed/chunked datasets (chunk size 1024),

> > and I've tested 2)-3) for both independent/collective options of the

> > MPI driver. I've also used 3 different clusters for measurements (all

> > quite modern).

> >

> > As a result, the running (storage) times of the same-file strategy, i.e.

> > 2) and 3), were of orders of magnitudes longer than the running times

> > of the separate-files strategy. For illustration:

> >

> > cluster #1, 512 MPI processes, each process stores 100 MB of data,

> > fixed data sets:

> >

> > 1) separate files: 2.73 [s]

> > 2) single file, independent calls, separate data sets: 88.54[s]

> >

> > cluster #2, 256 MPI processes, each process stores 100 MB of data,

> > chunked data sets (chunk size 1024):

> >

> > 1) separate files: 10.40 [s]

> > 2) single file, independent calls, shared data sets: 295 [s]

> > 3) single file, collective calls, shared data sets: 3275 [s]

> >

> > Any idea why the single-file strategy gives so poor writing performance?

> >

> > Daniel

>

> _______________________________________________

> Hdf-forum is for HDF software users discussion.

> [email protected]<mailto:[email protected]>

> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to