Hmm. If I understand you, you have written code that you believe produces an 
HDF5 file according to the 3.0 file version specification, 
https://support.hdfgroup.org/HDF5/doc/H5.format.html but nevertheless does NOT 
use the HDF5 library to do it. Furthermore, where 'extended padding' is 
concerned, your implementation does business differently than the HDF5 
implementation.

You can prove HDF5 tools will *read* the file ok. But, in a read-modify-write 
scenario, the file is getting corrupted by HDF5 library due to the difference 
in how the two implementations handle the extended padding -- a feature that 
you explain is '...not defined at all -- not even recommended'.

Is that about right?

If so, it does indeed sound like a potential issue in the file format 
specification for HDF5.

Your scenario sounds like a super useful test case...does a wholly independent 
implementation produce a file the HDF5 library can "handle"?

I wonder if there are settings in HDF5 library you may need to set (such as 
alignment or block-size or something) such that read-modify-write will indeed 
work ok? I wonder if there is some metadata missing from your file that will 
inform the HDF5 library what specific settings it must use to properly read and 
write to the file? I wonder if there is some boot-block information you have 
neglected to include so that the HDF5 library is not aware of all the 
paramaters effecting the file's layout.

The only reason for calling into question many possibilities of your 
implementation is that the HDF5 file format is fairly complex. I don't think it 
is easily duplicated without using the library itself. So, I think its highly 
likely you may be overlooking some important features of the format necessary 
for the HDF5 library to fully handle it.

All that said, I commend your courage for attempting it and hope others can 
chime in with more detailed thoughts on what to do about it.

Mark



"Hdf-forum on behalf of Krug, Markus" wrote:

Dear all,

I just came around an interesting issue.
I implemented the writing of HDF files on an embedded system. The amount of 
functionality I implemented is significant less than the HDF lib offers. So it 
is just tailored to my needs. I implemented everything on base of the HDF 3.0 
file spec. One point of my tailoring was to optimize the file size. Therefore, 
I write every internal block in the HDF files aligned byte-by-byte to the next 
– or padded to the address alignment if it is requested by the HDF file 
specification. The HDF files generated by HDFview or Matlab have plenty of 
space in-between the internal blocks. Sometimes a few hundred bytes. As far as 
I read from the HDF file specification this ‘extended padding’ is not defined 
at all – not even recommended.
However, this ‘extended padding’ that is performed by the HDF lib leads to a 
behavior that I would consider as an incompatibility to itself. To demonstrate 
this I attached two HDF files to this email. The first (sizeoptimized.h5) is 
generated by my embedded software and is optimized concerning the file size. It 
contains three compounds with each of them has 2 elements. You should be able 
to open that file in HDFview or similar tools and read all its contents.
The second file (sizeoptimizedextended.h5) is generated by HDFview by adding a 
fourth compound after the sizeoptimized.h5 file was opened in HDFview. You can 
see that the file is partly corrupted. The reason for this is that HDFview (and 
therefore the HDF lib I guess) is not really taking care about the position of 
the internal blocks of a file that it is writing to. It seems to me it has some 
internal mapping of those blocks. This mapping gets applied even if it will 
collide, and therefore corrupt, the existing blocks.
If my observation is correct I think the HDF lib will need a bugfix or the HDF 
file spec will need a description of how the internal blocks are allowed to be 
positioned within a HDF file.
I forgot to mention that I tried to use the HDF lib sources and compile it to 
my system. However, I quit after a couple of days because the way the sources 
are written are not suitable at all to adopt them to an embedded system that 
runs a simplified file system and a real-time operating system – and all of it 
has to fit into a few hundred kilobytes.

Can anyone comment on my observation?


Best Regards
Markus
_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to