Hi Bas,

It sounds that h5repack did the work, but your code didn't do as expected if the result files have same size. Just to mention, h5repack can potential reduce more size when dealing with entire file as it refreshes all the objects from previous changes.

According to your reply there are 3 HDF5 files; 1. original , 2. result from h5repack , 3. result from your code. Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?

Also could you send me how you ran h5repack?

Regards,

Jonathan

On 7/8/2011 2:44 AM, Bas Schoen wrote:
Hi Jonathan,

Thanks for your reply.

1. The size difference is between the file sizes of two HDF5 files. One with compression with my code, and one with repack. (and an original not compressed file which is the same size as with my compression code) 2. The chunk size I used is the count of items written to the dataset, which in this case was 32509. If I open the two files with the hdf5 viewer this value is shown in the properties in both files.

When creating the dataset I am not really sure whether to use the dataTypeFile or the dataTypeMem. I tried both and the result are the same (at least the difference between my code and repack stays the same, the file size of both files do however change).


Regards,

Bas


On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <[email protected] <mailto:[email protected]>> wrote:

    Hi Bas,

    I have a couple questions.
      1. About the size differences between h5repack and your code, is
    it the size of HDF5 file or dataset?
      2. About the chunk, what the size of chunk used for h5repack and
    your code?

    Jonathan

    On 7/7/2011 10:00 AM, Bas Schoen wrote:
    Hi,

    I'm trying to create a hdf5 file with some compound datatypes
    with GZIP compression. The development is done in C# using the
    HDF5DotNet dll.
    I need these compression options: shuffle & gzip=9 and I would
    like to achieve the same compression ration as h5repack.

    The problem however is that the compressed file is the same size
    as the not compressed file. If I use h5repack on that file, this
    size is 10 times smaller. Can someone see what I am doing wrong?

    Part of my implementation:


    // We want to write a compound datatype, which is a struct
    containing a int and some bytes
      DataStruct[]  data = new   DataStruct[]{...};  //data has been
    filled

    // Create the compound datatype for memory
    H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
    (int)Marshal.SizeOf(default(DataStruct)));
      H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "A"), H5T.H5Type.NATIVE_INT);
    H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "B"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "C"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "D"), H5T.H5Type.NATIVE_UCHAR);
    H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof(
    DataStruct ), "E"), H5T.H5Type.NATIVE_UCHAR);

    // Create the compound datatype for the file. Because the standard
    // types we are using for the file may have different sizes than
    // the corresponding native types, we must manually calculate the
    // offset of each member.
    int offset = 0;
    H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND,
    (int)(4 + 1 + 1 + 1 + 1));
    H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
    offset += 4;
    H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
    offset += 1;
    H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
    offset += 1;

    long[] dims = { (long)  data.Count() };

    try
    {
      // Create dataspace, with maximum = current
    H5DataSpaceId dataSpace = H5S.create_simple(1, dims);

    //Create compression properties
      long[] chunk = dims; //What value should be used as chunk?
    H5PropertyListId compressProperty =
    H5P.create(H5P.PropertyListClass.DATASET_CREATE);
    H5P.setShuffle(compressProperty);
      H5P.setDeflate(compressProperty, 9)
    H5P.setChunk(compressProperty, chunk);

    // Create the data set
      H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile,
    dataSpace, new H5PropertyListId(H5P.Template.DEFAULT),
    compressProperty, new H5PropertyListId(H5P.Template.DEFAULT));

    // Write data to it
    H5D.write(dataSet, dataTypeMem, new
    H5DataSpaceId(H5S.H5SType.ALL), new
    H5DataSpaceId(H5S.H5SType.ALL), new
    H5PropertyListId(H5P.Template.DEFAULT), new H5Array< DataStruct
    >(data));

    // Cleanup
    H5T.close(dataTypeMem);
    H5T.close(dataTypeFile);
    H5D.close(dataSet);
    H5P.close(compressProperty);
    H5S.close(dataSpace);
    }
    catch
    {
    ...
    }


    All steps are: creating datatype for both file and memory,
    creating  dataspace, creating the dataset with shuffle and
    compression creation properties and finally writing the data to file.
    It might be a bit difficult to check this code, but are there any
    steps missing/incorrect?

    Help appreciated.

    Best regards,

    Bas Schoen


        



    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected]  <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


    _______________________________________________
    Hdf-forum is for HDF software users discussion.
    [email protected] <mailto:[email protected]>
    http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


        




_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to