If you see a lot of read in your write-only workloads, it suggests that a "data sieving" optimization is kicking in. If there are only partial updates to a block of data, then something will read the whole block, update the changed bits, and write the new block out. I'm vauge about where the optimization happens because raid devices, file systems, and the ROMIO MPI-IO implementation could all be doing this.

With Collective I/O, you can transform your workload into something more contiguous and less likely to trigger data sieving, but it can still happen.

You can pass MPI-IO hints to hdf5 to turn off collective I/O *and* turn of data sieving -- this is the way Lustre folks got good performance in the 2008-ish time frame. it could either help you a lot or hurt you a lot. I cannot tell you more over email.

==rob

On 10/24/2014 11:22 AM, teodavid.shaw wrote:



Sent from Samsung Mobile



-------- Original message --------
From: Angel de Vicente <[email protected]>
Date: 24/10/2014 10:51 (GMT+00:00)
To: HDF Users Discussion List <[email protected]>
Subject: Re: [Hdf-forum] Slow writing parallel HDF5 performance (for
only one variable)


Hi,

resurrecting an old thread...

Rob Latham <[email protected]> writes:
 >>> Are you familiar with the Darshan statistics tool?  you can use it
to confirm
 >>> you are hitting (or not) unaligned writes.
 >>
 >> Not really. Only heard about it, but I will try it. This issue is
 >> proving pretty hard to figure out, and it is a real bottleneck for our
 >> code, so I will try anything...
 >
 > Yeah, i'm getting off into the woods here, so a tool like Darshan can
help you
 > answer the low-level details I'm bugging you about.

at last I got the chance to try Darshan.

Just as a reminder, I have this problem when writing data that comes
from a 4D array to an HDF5 file collectively. The code that shows the
problem (not everywhere, but very badly in the particular cluster I'm
using right now) is attached (phdf5write.f90). As it is, the code is
meant to run in 64 processors, The global data to be written to the file
is a 4D array of 100x100x24x12. The code runs with 64 processors and
each has a 4D array of dimensions 25x25x24x12. When writing to the file,
only 16 processors dump their data to the file. Those 16 processors dump
their whole data while the other processors none. The only thing that
changes from the three possible "modes" of the code are which processors
do the writing, and the offsets in the file. So far, I managed to run it
without any issues in the CURIE cluster, where either mode behaves
similarly and (for this particular case), the writing of the files takes
about a second. But in two local clusters I run into big problems for
mode 3 (PMLZ).

Until now I only knew that this third mode (PMLZ) took much longer
(about two orders of magnitude more). Now with Darshan I see that
something weird is going on... Modes 1 and 2 are very similar in the
time they take and in the Darshan reports, but mode 3 is completely
weird to me. For starters, it says that the code spends a lot of time
READING files, doing Metadata operations, ...  while the code only
writes data. With the hope that someone more experienced than me with
I/O issues can shed some light into this issue, I attach the Darshan
reports for Mode 1 and 3. Any help/pointers much appreciated.




--
Ángel de Vicente
http://www.iac.es/galeria/angelv/

---------------------------------------------------------------------------------------------
ADVERTENCIA: Sobre la privacidad y cumplimiento de la Ley de Protecci�n
de Datos, acceda a http://www.iac.es/disclaimer.php
WARNING: For more information on privacy and fulfilment of the Law
concerning the Protection of Data, consult
http://www.iac.es/disclaimer.php?lang=en

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5


--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to