Hi Jonathan,
I've attached the output from h5dump in 3 files, original, compressed(using
my code) and repack.
I ran h5repack with the following commands "h5repack -f SHUF -f GZIP=9
<input.h5> <output.h5>
Just to make sure: The problems I'm having are not really related to
h5repack. My implementation of gzip compression just doesn't compress the
hdf5 file at all. Or even worse, if I use a small chunk size (say: 20) the
filesize increases compared to the original hdf5 file.
I've tried to make things easier and written a small test function which
doesn't write a compound datatype but just an int array. But my compression
still doesn't work. I've got the feeling I am missing a important step in
the compression process.
This is what I tried:
Random rand = new Random();
int[] data = new int[32508];
//Fill some dummy data
for (int i = 0; i < 32508; i++)
data[i] = rand.Next();
//Create DataSpace
H5DataSpaceId dataSpace = H5S.create_simple(1, new long[] { data.Length });
//Create Creation Property List
H5PropertyListId compressProperty =
H5P.create(H5P.PropertyListClass.DATASET_CREATE);
H5P.setShuffle(compressProperty);
H5P.setDeflate(compressProperty, 9);
H5P.setChunk(compressProperty, new long[] { data.Length } );
//Create DataSet with compression enabled
H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace, new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
H5PropertyListId(H5P.Template.DEFAULT)); //this line has been used to turn
compression on
//H5DataSetId dataSet = H5D.create(fileId, "Test", H5T.H5Type.STD_I32LE,
dataSpace); //this line has been used to turn compression off
//Write data to file
H5D.write(dataSet, new H5DataTypeId(H5T.H5Type.NATIVE_INT), new
H5DataSpaceId(H5S.H5SType.ALL), new H5DataSpaceId(H5S.H5SType.ALL),
new H5PropertyListId(H5P.Template.DEFAULT), new H5Array<int>(data));
H5P.close(compressProperty);
H5D.close(dataSet);
H5S.close(dataSpace);
Regards,
Bas
On Fri, Jul 8, 2011 at 5:03 PM, Jonathan Kim <[email protected]> wrote:
> Hi Bas,
>
> It sounds that h5repack did the work, but your code didn't do as expected
> if the result files have same size. Just to mention, h5repack can
> potential reduce more size when dealing with entire file as it refreshes all
> the objects from previous changes.
>
> According to your reply there are 3 HDF5 files; 1. original , 2. result
> from h5repack , 3. result from your code.
> Could you send us either the outputs from "h5dump -p -H <HDF5 file>" or
> sanpshots from hdf-view's 'show properties' pop-up window for the 3 files?
>
> Also could you send me how you ran h5repack?
>
> Regards,
>
> Jonathan
>
> On 7/8/2011 2:44 AM, Bas Schoen wrote:
>
> Hi Jonathan,
>
> Thanks for your reply.
>
> 1. The size difference is between the file sizes of two HDF5 files. One
> with compression with my code, and one with repack. (and an original not
> compressed file which is the same size as with my compression code)
> 2. The chunk size I used is the count of items written to the dataset,
> which in this case was 32509. If I open the two files with the hdf5 viewer
> this value is shown in the properties in both files.
>
> When creating the dataset I am not really sure whether to use the
> dataTypeFile or the dataTypeMem. I tried both and the result are the same
> (at least the difference between my code and repack stays the same, the file
> size of both files do however change).
>
>
> Regards,
>
> Bas
>
>
> On Thu, Jul 7, 2011 at 6:15 PM, Jonathan Kim <[email protected]> wrote:
>
> Hi Bas,
>
> I have a couple questions.
> 1. About the size differences between h5repack and your code, is it the
> size of HDF5 file or dataset?
> 2. About the chunk, what the size of chunk used for h5repack and your
> code?
>
> Jonathan
>
> On 7/7/2011 10:00 AM, Bas Schoen wrote:
>
> Hi,
>
> I'm trying to create a hdf5 file with some compound datatypes with GZIP
> compression. The development is done in C# using the HDF5DotNet dll.
> I need these compression options: shuffle & gzip=9 and I would like to
> achieve the same compression ration as h5repack.
>
> The problem however is that the compressed file is the same size as the
> not compressed file. If I use h5repack on that file, this size is 10 times
> smaller. Can someone see what I am doing wrong?
>
> Part of my implementation:
>
>
> // We want to write a compound datatype, which is a struct containing a
> int and some bytes
> DataStruct[] data = new DataStruct[]{...}; //data has been filled
>
> // Create the compound datatype for memory
> H5DataTypeId dataTypeMem = H5T.create(H5T.CreateClass.COMPOUND,
> (int)Marshal.SizeOf(default(DataStruct)));
> H5T.insert(dataTypeMem, "A", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "A"), H5T.H5Type.NATIVE_INT);
> H5T.insert(dataTypeMem, "B", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "B"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "C", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "C"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "D", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "D"), H5T.H5Type.NATIVE_UCHAR);
> H5T.insert(dataTypeMem, "E", (int)Marshal.OffsetOf(typeof( DataStruct ),
> "E"), H5T.H5Type.NATIVE_UCHAR);
>
> // Create the compound datatype for the file. Because the standard
> // types we are using for the file may have different sizes than
> // the corresponding native types, we must manually calculate the
> // offset of each member.
> int offset = 0;
> H5DataTypeId dataTypeFile = H5T.create(H5T.CreateClass.COMPOUND, (int)(4 +
> 1 + 1 + 1 + 1));
> H5T.insert(dataTypeFile, "A", offset, H5T.H5Type.STD_U32BE);
> offset += 4;
> H5T.insert(dataTypeFile, "B", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "C", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "D", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
> H5T.insert(dataTypeFile, "E", offset, H5T.H5Type.STD_U8BE);
> offset += 1;
>
> long[] dims = { (long) data.Count() };
>
> try
> {
> // Create dataspace, with maximum = current
> H5DataSpaceId dataSpace = H5S.create_simple(1, dims);
>
> //Create compression properties
> long[] chunk = dims; //What value should be used as chunk?
> H5PropertyListId compressProperty =
> H5P.create(H5P.PropertyListClass.DATASET_CREATE);
> H5P.setShuffle(compressProperty);
> H5P.setDeflate(compressProperty, 9)
> H5P.setChunk(compressProperty, chunk);
>
> // Create the data set
> H5DataSetId dataSet = H5D.create(fileId, "NAME", dataTypeFile, dataSpace,
> new H5PropertyListId(H5P.Template.DEFAULT), compressProperty, new
> H5PropertyListId(H5P.Template.DEFAULT));
>
> // Write data to it
> H5D.write(dataSet, dataTypeMem, new H5DataSpaceId(H5S.H5SType.ALL), new
> H5DataSpaceId(H5S.H5SType.ALL), new H5PropertyListId(H5P.Template.DEFAULT),
> new H5Array< DataStruct >(data));
>
> // Cleanup
> H5T.close(dataTypeMem);
> H5T.close(dataTypeFile);
> H5D.close(dataSet);
> H5P.close(compressProperty);
> H5S.close(dataSpace);
> }
> catch
> {
> ...
> }
>
>
> All steps are: creating datatype for both file and memory, creating
> dataspace, creating the dataset with shuffle and compression creation
> properties and finally writing the data to file.
> It might be a bit difficult to check this code, but are there any steps
> missing/incorrect?
>
> Help appreciated.
>
> Best regards,
>
> Bas Schoen
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users
> [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users
> [email protected]http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>
>
>
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
HDF5 "D:\original.h5" {
GROUP "/" {
DATASET "Test" {
DATATYPE H5T_COMPOUND {
H5T_STD_U32BE "A";
H5T_STD_U8BE "B";
H5T_STD_U8BE "C";
H5T_STD_U8BE "D";
H5T_STD_U8BE "E";
}
DATASPACE SIMPLE { ( 32509 ) / ( 32509 ) }
STORAGE_LAYOUT {
CONTIGUOUS
SIZE 520144
OFFSET 2360
}
FILTERS {
NONE
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE {
0,
0,
0,
0,
0
}
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_LATE
}
}
}
}HDF5 "D:\comp.h5" {
GROUP "/" {
DATASET "Test" {
DATATYPE H5T_COMPOUND {
H5T_STD_U32BE "A";
H5T_STD_U8BE "B";
H5T_STD_U8BE "C";
H5T_STD_U8BE "D";
H5T_STD_U8BE "E";
}
DATASPACE SIMPLE { ( 32509 ) / ( 32509 ) }
STORAGE_LAYOUT {
CHUNKED ( 32509 )
SIZE 520144 (1.000:1 COMPRESSION)
}
FILTERS {
PREPROCESSING SHUFFLE
COMPRESSION DEFLATE { LEVEL 9 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE {
0,
0,
0,
0,
0
}
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
}HDF5 "D:\repack.h5" {
GROUP "/" {
DATASET "Test" {
DATATYPE H5T_COMPOUND {
H5T_STD_U32BE "A";
H5T_STD_U8BE "b";
H5T_STD_U8BE "C";
H5T_STD_U8BE "D";
H5T_STD_U8BE "E";
}
DATASPACE SIMPLE { ( 32509 ) / ( 32509 ) }
STORAGE_LAYOUT {
CHUNKED ( 32509 )
SIZE 260377 (1.998:1 COMPRESSION)
}
FILTERS {
PREPROCESSING SHUFFLE
COMPRESSION DEFLATE { LEVEL 9 }
}
FILLVALUE {
FILL_TIME H5D_FILL_TIME_IFSET
VALUE {
0,
0,
0,
0,
0
}
}
ALLOCATION_TIME {
H5D_ALLOC_TIME_INCR
}
}
}
}_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org