Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Bas Schoen Fri, 08 Jul 2011 09:05:04 -0700

Hi Jonathan,

I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.


I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>

Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.

I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.

This is what I tried:

Random rand = new Random();
  int[] data = new int[32508];

  //Fill some dummy data
for (int i = 0; i <   32508; i++)
       data[i] = rand.Next();

//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });

//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty,   new long[] { data.Length } );

//Create DataSet with compression enabled
  H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
  //H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off

//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));

H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);

Regards,

Bas

On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <[email protected]> wrote:

>  Hi Bas,
>
> It sounds that h5repack did the work, but your code didn't do as expected
> if the result files have same size.  Just to mention,  h5repack can
> potential reduce more size when dealing with entire file as it refreshes all
> the objects from previous changes.
>
> According to your reply there are 3 HDF5 files; 1. original , 2. result
> from h5repack  , 3. result from your code.
> Could you send us either the outputs from "h5dump -p -H  <HDF5 file>" or
> sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?
>
> Also could you send me how you ran h5repack?
>
> Regards,
>
> Jonathan
>
> On 7/8/2011 2:44 AM, Bas Schoen wrote:
>
> Hi Jonathan,
>
>  Thanks for your reply.
>
>  1. The size difference is between the file sizes of two HDF5 files. One
> with compression with my code, and one with repack. (and an original not
> compressed file which is the same size as with my compression code)
> 2. The chunk size I used is the count of items written to the dataset,
> which in this case was 32509. If I open the two files with the hdf5 viewer
> this value is shown in the properties in both files.
>
>  When creating the dataset I am not really sure whether to use the
> dataTypeFile or the dataTypeMem. I tried both and the result are the same
> (at least the difference between my code and repack stays the same, the file
> size of both files do however change).
>
>
>  Regards,
>
>  Bas
>
>
> On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <[email protected]> wrote:
>
>  Hi Bas,
>
> I have a couple questions.
>   1. About the size differences between h5repack and your code, is it the
> size of HDF5 file or dataset?
>   2. About the chunk, what the size of chunk used for h5repack and your
> code?
>
> Jonathan
>
> On 7/7/2011 10:00 AM, Bas Schoen wrote:
>
> Hi,
>
>  I'm trying to create a hdf5 file with some compound datatypes with GZIP
> compression. The development is done in C# using the HDF5DotNet dll.
> I need these compression options: shuffle & gzip=9 and I would like to
> achieve the same compression ration as h5repack.
>
>  The problem however is that the compressed file is the same size as the
> not compressed file. If I use h5repack on that file, this size is 10 times
> smaller. Can someone see what I am doing wrong?
>
>  Part of my implementation:
>
>
>  // We want to write a compound datatype, which is a struct containing a
> int and some bytes
>   DataStruct[]  data = new   DataStruct[]{...};  //data has been filled
>
>  // Create the compound datatype for memory
> H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
> (int)Marshal.SizeOf(default(DataStruct)));
>   H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "A"), H5T.H5Type.NATIVE_INT);
> H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "B"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "C"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "D"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "E"), H5T.H5Type.NATIVE_UCHAR);
>
> // Create the compound datatype for the file. Because the standard
> // types we are using for the file may have different sizes than
> // the corresponding native types, we must manually calculate the
> // offset of each member.
> int offset = 0;
> H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
> 1 + 1 + 1 + 1));
> H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
> offset += 4;
> H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
>
> long[] dims = { (long)  data.Count() };
>
> try
> {
>   // Create dataspace, with maximum = current
> H5DataSpaceId dataSpace = H5S.create_simple(1, dims);
>
> //Create compression properties
>   long[] chunk = dims; //What value should be used as chunk?
> H5PropertyListId compressProperty =
> H5P.create(H5P.PropertyListClass.DATASET_CREATE);
> H5P.setShuffle(compressProperty);
>   H5P.setDeflate(compressProperty, 9)
> H5P.setChunk(compressProperty, chunk);
>
> // Create the data set
>   H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
> new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
> H5PropertyListId(H5P.Template.DEFAULT));
>
> // Write data to it
> H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
> H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
> new H5Array< DataStruct >(data));
>
> // Cleanup
> H5T.close(dataTypeMem);
> H5T.close(dataTypeFile);
> H5D.close(dataSet);
> H5P.close(compressProperty);
> H5S.close(dataSpace);
> }
> catch
> {
> ...
> }
>
>
>  All steps are: creating datatype for both file and memory, creating
>  dataspace, creating the dataset with shuffle and compression creation
> properties and finally writing the data to file.
>  It might be a bit difficult to check this code, but are there any steps
> missing/incorrect?
>
>  Help appreciated.
>
>   Best regards,
>
> Bas Schoen
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users 
> [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users 
> [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

HDF5 "D:\original.h5" {
GROUP "/" {
   DATASET "Test" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_U32BE "A";
         H5T_STD_U8BE "B";
         H5T_STD_U8BE "C";
         H5T_STD_U8BE "D";
         H5T_STD_U8BE "E";
      }
      DATASPACE  SIMPLE { ( 32509 ) / ( 32509 ) }
      STORAGE_LAYOUT {
         CONTIGUOUS
         SIZE 520144
         OFFSET 2360
      }
      FILTERS {
         NONE
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE     {
         0,
         0,
         0,
         0,
         0
      }
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_LATE
      }
   }
}
}

HDF5 "D:\comp.h5" {
GROUP "/" {
   DATASET "Test" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_U32BE "A";
         H5T_STD_U8BE "B";
         H5T_STD_U8BE "C";
         H5T_STD_U8BE "D";
         H5T_STD_U8BE "E";
      }
      DATASPACE  SIMPLE { ( 32509 ) / ( 32509 ) }
      STORAGE_LAYOUT {
         CHUNKED ( 32509 )
         SIZE 520144 (1.000:1 COMPRESSION)
       }
      FILTERS {
         PREPROCESSING SHUFFLE
         COMPRESSION DEFLATE { LEVEL 9 }
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE     {
         0,
         0,
         0,
         0,
         0
      }
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_INCR
      }
   }
}
}

HDF5 "D:\repack.h5" {
GROUP "/" {
   DATASET "Test" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_U32BE "A";
         H5T_STD_U8BE "b";
         H5T_STD_U8BE "C";
         H5T_STD_U8BE "D";
         H5T_STD_U8BE "E";
      }
      DATASPACE  SIMPLE { ( 32509 ) / ( 32509 ) }
      STORAGE_LAYOUT {
         CHUNKED ( 32509 )
         SIZE 260377 (1.998:1 COMPRESSION)
       }
      FILTERS {
         PREPROCESSING SHUFFLE
         COMPRESSION DEFLATE { LEVEL 9 }
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE     {
         0,
         0,
         0,
         0,
         0
      }
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_INCR
      }
   }
}
}

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Fwd: GZIP Compression and h5repack

Reply via email to