Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Bas Schoen Fri, 08 Jul 2011 00:47:14 -0700

 Hi Jonathan,

Thanks for your reply.


1. The size difference is between the file sizes of two HDF5 files. One with
compression with my code, and one with repack. (and an original not
compressed file which is the same size as with my compression code)
2. The chunk size I used is the count of items written to the dataset, which
in this case was 32509. If I open the two files with the hdf5 viewer this
value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the
dataTypeFile or the dataTypeMem. I tried both and the result are the same
(at least the difference between my code and repack stays the same, the file
size of both files do however change).


Regards,

Bas


On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <[email protected]> wrote:

>  Hi Bas,
>
> I have a couple questions.
>   1. About the size differences between h5repack and your code, is it the
> size of HDF5 file or dataset?
>   2. About the chunk, what the size of chunk used for h5repack and your
> code?
>
> Jonathan
>
> On 7/7/2011 10:00 AM, Bas Schoen wrote:
>
> Hi,
>
>  I'm trying to create a hdf5 file with some compound datatypes with GZIP
> compression. The development is done in C# using the HDF5DotNet dll.
> I need these compression options: shuffle & gzip=9 and I would like to
> achieve the same compression ration as h5repack.
>
>  The problem however is that the compressed file is the same size as the
> not compressed file. If I use h5repack on that file, this size is 10 times
> smaller. Can someone see what I am doing wrong?
>
>  Part of my implementation:
>
>
>  // We want to write a compound datatype, which is a struct containing a
> int and some bytes
>   DataStruct[]  data = new   DataStruct[]{...};  //data has been filled
>
>  // Create the compound datatype for memory
> H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
> (int)Marshal.SizeOf(default(DataStruct)));
>   H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "A"), H5T.H5Type.NATIVE_INT);
> H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "B"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "C"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "D"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "E"), H5T.H5Type.NATIVE_UCHAR);
>
> // Create the compound datatype for the file. Because the standard
> // types we are using for the file may have different sizes than
> // the corresponding native types, we must manually calculate the
> // offset of each member.
> int offset = 0;
> H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
> 1 + 1 + 1 + 1));
> H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
> offset += 4;
> H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
>
> long[] dims = { (long)  data.Count() };
>
> try
> {
>   // Create dataspace, with maximum = current
> H5DataSpaceId dataSpace = H5S.create_simple(1, dims);
>
> //Create compression properties
>   long[] chunk = dims; //What value should be used as chunk?
> H5PropertyListId compressProperty =
> H5P.create(H5P.PropertyListClass.DATASET_CREATE);
> H5P.setShuffle(compressProperty);
>   H5P.setDeflate(compressProperty, 9)
> H5P.setChunk(compressProperty, chunk);
>
> // Create the data set
>   H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
> new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
> H5PropertyListId(H5P.Template.DEFAULT));
>
> // Write data to it
> H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
> H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
> new H5Array< DataStruct >(data));
>
> // Cleanup
> H5T.close(dataTypeMem);
> H5T.close(dataTypeFile);
> H5D.close(dataSet);
> H5P.close(compressProperty);
> H5S.close(dataSpace);
> }
> catch
> {
> ...
> }
>
>
>  All steps are: creating datatype for both file and memory, creating
>  dataspace, creating the dataset with shuffle and compression creation
> properties and finally writing the data to file.
>  It might be a bit difficult to check this code, but are there any steps
> missing/incorrect?
>
>  Help appreciated.
>
>   Best regards,
>
> Bas Schoen
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users 
> [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Reply via email to