Hi Maxime,

If what I understand is true from what you are saying, you are getting pretty 
good performance when writing the large dataset, but there is something else 
going on that is slowing your I/O at the end?

Let me first mention that HDF5 manages in the background a metadata cache (not 
to be confused with your metadata dataset). Any updates to the HDF5 file may 
trigger a flush of the metadata cache at some point in time (usually it's at 
file close if you are not doing a lot of metadata updates). The metadata is 
internal to HDF5 and contains information about the file like object header 
information, the file superblock, and many other things that are transparent to 
the application.. This explains why you are seeing I/Os after your dataset is 
closed. You will stop seeing updates on the file after you call H5Fclose(). 

The small "metadata" dataset is something I don't understand. What do you mean 
by "it contains various data types"? The dataset is created with only 1 HDF5 
datatype and can't have multiple datatypes. Variable length datatypes are not 
permitted in parallel. So please explain more what you mean by that. 
Also how large is the small dataset and how are you writing to it? Do all 960 
MPI ranks write different hyperslabs  to this small dataset (I wouldn't imagine 
it would be small then), or does only 1 rank write your metadata to that 
dataset? Are you using collective I/O if all processes are writing? Can you use 
an attribute instead of a small dataset?

It would be great if you can share the application so we can try it out, but I 
understand that it might not always be possible.

Thanks,
Mohamad


-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of 
Maxime Boissonneault
Sent: Tuesday, January 27, 2015 2:49 PM
To: [email protected]
Subject: [Hdf-forum] File keeps being updated long after the dataset is closed

Hi,
I am writing a very large (572GB) file with HDF5, on a Lustre filesystem, using 
960 MPI processes spread over 120 nodes.

I am monitoring the IO that is going on the filesystem at this time. I see a 
very large peak, around ~2GB/s for roughly 3-4 minutes. My internal timers 
(from creating the dataset, selecting the memory hyperslab, writing the 
dataset, and closing the dataset), tells me writing takes 180s, which 
corresponds to the peak I see on our Lustre servers.

After writing the big dataset, I write a small "metadata" dataset, containing 
details of the run. This dataset is very small, and contains various data 
types, while the big dataset contains only doubles.

My problem : the H5 file keeps being updated (I watch the last modified
date) long after the big dataset is written ~10-15 minutes after.

Is it possible that writing the small dataset at the end takes so much time, 
while the big dataset is so quick to write ? In the 10-15 minutes after writing 
the big dataset, I see next to nothing happening on our lustre filesystem.


Any idea what may be going on ?



--
---------------------------------
Maxime Boissonneault
Analyste de calcul - Calcul Québec, Université Laval Ph. D. en physique


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to