Re: [Hdf-forum] Avoiding corruption of the HDF5 File

Miller, Mark C. Thu, 28 Sep 2017 08:38:50 -0700

So, just to be clear...my suggestion was to call H5Fflush() "...relatively 
regularly...". I guess I was intentionally vague because I think its obvious 
that there definitely *will*be* a performance hit and you then have to tradeoff 
the cost of risk of loss of data with the cost of loss of performance.


This thread does make me wonder about something though (and this may be a 
question for THG)...is the corruption "localized" (or is there a way of 
guaranteeing that it will be localized) to only the most recently written 
objects? Or, is it the case that *all*data* written to the file in the past is 
at risk of corruption if a failure occurs in the "current" operation?

Obviously, if its just the most recently written stuff that is at risk of 
corruption, that is much more tolerable. And, I thought that one of the 
command-line tools could help to 'fix' a broken hdf5 file but I can't remember 
which now (h5debug maybe?)

Mark


"Hdf-forum on behalf of Ger van Diepen" wrote:


I fear that a flush after each write can be quite expensive. Furthermore, I do 
not know if HDF5 guarantees the file to be uncorrupted if the failure occurs 
during the write of data (inbetween flushes).


Another option is to write the data into an external raw data file and make 
links to that file (or segments) in the HDF5 file. It is described in section 
5.5.4 of the HDF users's guide. In case of an unexpected failure, it is always 
possible to make the links afterwards.

We make use of it in our LOFAR data writer.



>>> Ewan Makepeace <makepe...@jawasoft.com> 28-Sep-17 5:48 >>>
Thank-you to all of you who have replied to my query on this (appended at 
bottom).



To summarise replies (with my feedback inline) are:



From: "Miller, Mark C." <mille...@llnl.gov<mailto:mille...@llnl.gov>>



Well, I think it is best to close datasets, dataspaces, types, and groups as 
soon as possible when you know you no longer need them. That should help to 
minimize memory usage. Also, can you possibly add a call to H5Fflush() 
(https://support.hdfgroup.org/HDF5/doc/RM/RM_H5F.html#File-Flush) so that it 
happens relatively regularly? Can you possibly do something like on Linux where 
you *catch* a signal and then call H5Fclose() on the file as part of the signal 
handler? Are you by chance calling H5dont_atexit() 
(https://support.hdfgroup.org/HDF5/doc/RM/RM_H5.html#Library-DontAtExit) 
somewhere to prevent HDF5's smarts to close down the file gracefully upon exit? 
(fyi...these are all linux-isms and so I don't know if they will be of much use 
to you in your context)



We have written a completely object oriented layer to manage the references - 
all the objects get disposed correctly (and in the right order) in normal 
operation. The problem (as others have pointed out) is that the caching in HDF5 
leaves the files in an unpredictable and often invalid state when we terminate 
unexpectedly.



We will try adding a call to H5Flush after every write which may solve the 
issue although at what cost in performance I do not know.



One of the DOE labs invested in a 'journaling metadata' enhancement to HDF5. I 
think that work was nearly completed. However, it has since staled on a private 
branch and has yet to have been merged into the mainline of the code. It might 
be worth making a pitch for that if you think it could be useful in this 
context. Again, I am not sure because all my experience is linux-centric.



This does sound like a problem that would be solved by file journaling - but in 
the absence of a library not an option.



"Jager, Gerco de" <g.d.ja...@marin.nl<mailto:g.d.ja...@marin.nl>>



I recently started writing a converter from our proprietary measurements format 
to HDF5 in C# and using the HDF.PInvoke nuget distribution. I've read that the 
HDF.PInvoke is the way forward and hopefully it discloses all the features you 
need.



I am aware that HDF5.net<http://HDF5.net> is deprecated and systems based on 
PInvoke are recommended but we have had almost no issues with it so far - as I 
said if the system does not stop while writing (due to exceptions in other code 
unrelated to the persistence layer) the file is never corrupted. In fact I 
suspect that the problem is in the caching of data and doubt PInvoke will solve 
that problem.



From: Quincey Koziol <koz...@lbl.gov<mailto:koz...@lbl.gov>>



Hi Ewan,
There?s two things you can be doing to address file corruption issues:

- For the near term, use the techniques and code for managing the metadata 
cache described here:  
https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf<https://support.hdfgroup.org/HDF5/docNewFeatures/FineTuneMDC/RFC%20H5Ocork%20v5%20new%20fxn%20names.pdf>

- In the next year or so, we will be finishing the ?SWMR? feature, described 
here:  
https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html<https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html>

The metadata cache techniques are rather unsubtle, but will avoid corrupted 
files until the ?full? SWMR feature is finished.



This is fascinating stuff. The SWMR features have a lot of application for us 
but seem to be taking longer than originally expected. The meta data management 
tools are of interest - but I am not sure we need fine grained control here - 
we basically need to have the file valid after every write and so I think the 
first thing to try is just flush the whole file every write.



The other option I am considering is to remove our HDF5 code from the assembly 
and run it as a standalone service so that in the event of a crash in our 
application the HDF5 service is still running and hopefully able to flush and 
close the file gracefully.



rgds,

Ewan



Original Question:



Dear Experts,

We are building a data acquisition and processing system on top of an HDF5 file 
store. Generally we have been very pleased with HDF5 - great flexibility in 
data structure, performant, small file size, availability of third party data 
access tools etc.

However our system needs to run for 36-48 hours at a time - and we are finding 
that if we (deliberately or accidentally) stop the process while running (and 
writing data) the file is corrupted and we lose all our work.

We are in C# and wrote our access routines on top of 
HDF5.net<http://HDF5.net><http://HDF5.net> (which I understand is deprecated). 
We tend to keep all active pointer objects open for the duration of the process 
that reads or writes them (file, group and dataset handles in particular).

1) Is there a full featured replacement for 
HDF5.net<http://HDF5.net><http://HDF5.net> now, that I was unaware of? Previous 
contenders were found to be missing support for features we depend on. If so 
will it address the corruption issue?

2) Should we be opening and closing all the entities on every write? I would 
have thought that would dramatically slow access but perhaps not. Guidance?

3) Are there any other tips to making the file less susceptible to corruption 
if writing is abandoned unexpectedly?

Please help - this issue could be serious enough to make us reconsider our 
storage choice, which would be expensive now.

rgds,
Ewan

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Avoiding corruption of the HDF5 File

Reply via email to