Re: [Hdf-forum] hdf5 compression problem

Le Goc Yannick Wed, 21 Nov 2012 08:03:24 -0800

I have explored the problem and now am able to give an explanation. So Ido not have to send you the files :)

I traced the code and found after a bunch of printf that HDF5 wasfailing to allocate the buffer for compression. For the biggest files(datasets with 384 x 256 x 1024 values), the error occurs in thefunction H5Z_filter_deflate:HGOTO_ERROR(H5E_RESOURCE, H5E_NOSPACE, 0, "unable to allocate deflatedestination buffer") is executed.

However this error is not logged when I use the standard way by definingan error handler:


herr_t error_handler(hid_t err_stack, void *unused) {
    H5Eprint1(stderr); return 0;
}
H5Eset_auto(H5E_DEFAULT, error_handler, NULL);

How could I have this error properly logged? (the reason seems to comefrom the fact that H5_IS_API(H5Z_filter_deflate) = 0)This is very important to me since the execution is ok, but the onlyconsequence is that the compression is skipped and the written file is big.

After I discovered the error, I realized that I was using a single chunk(we are not experienced HDF users, and we had no problem in the pastwith HDF4).Thus I tested the writing with different chunk sizes on my PC (IntelXeon CPU E5530 @ 2.40GHz 4 cores, 4G RAM, Linux SLED 11) for a filecontaining a dataset of size 384 x 256 x 1200.


Here are my results:

Chunk Global HDF5HDF5 Write timesize max memory max memory memory atend (seconds)

dataset size 43% 19%0% 922 ^ 20 30% 6%0% 642 ^ 15 30% 6%0% 242 ^ 12 45%21% ~8% 22

2 ^ 10                 overflow

(Note: the HDF5 memory footprints are deduced from global memoryfootprints because the system uses 24% when no HDF5 write is done and1.5% at the end of the write)


For the sizes 2 ^ 20 and 2 ^ 15, why is the memory footprint the same?

For the size 2 ^ 12, things become instable, the memory peak is greaterthan for 2 ^ 15 and 2 ^ 20 and the memory at end seems to reveal a leak.For the size 2 ^ 10, things become worse and the program exitsunexpectedly due to a memory overflow.

In my opinion, the writing should remain stable for any chunk size thatcan be allocated and the writing failure for 2 ^ 10 is a problem.

The writing algorithm should be scalable for small data chunk sizes.

Am I missing something?

However those results showed me that there is an optimal data chunk sizebut it seems to be really system/hardware-dependent.Do we have to calibrate the optimal data chunk size for eachsystem/hardware?


On 11/14/2012 05:06 PM, Elena Pourmal wrote:

Hello,

Would it be possible for you to send us an example that demonstratesthe problem? Could you please also send those two files to[email protected] <mailto:[email protected]>?

It will also help if we know how many datasets you have in a datagroup when you see such behavior. Which version of the gzip libraryare you using? Which OS and compiler? Have you tried your applicationwith the latest HDF5?


Thank you!

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



On Nov 13, 2012, at 9:59 AM, ylegoc wrote:

Our instrument control software uses hdf5 files to store neutronacquisition

data files.

When the size of the "data" group is growing, we have randomcompressions.Sometimes the dataset is compressed, sometimes not. Here is the dumpof twofiles containing the same dataset but with different resultingcompression:


Bad file :

HDF5 "000028.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE  H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE  SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE  H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE  SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE  H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE  SCALAR
        }
        DATASET "data" {
           DATATYPE  H5T_STD_I32LE
           DATASPACE  SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 402653184 (1.000:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE  0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE  H5T_STD_I32LE
              DATASPACE  SCALAR
           }
        }
     }

Correct file :

HDF5 "000029.nxs" {
GROUP "/" {
  ATTRIBUTE "HDF5_Version" {
     DATATYPE  H5T_STRING {
           STRSIZE 5;
           STRPAD H5T_STR_NULLTERM;
           CSET H5T_CSET_ASCII;
           CTYPE H5T_C_S1;
        }
     DATASPACE  SCALAR
  }
  GROUP "entry0" {
     ATTRIBUTE "NX_class" {
        DATATYPE  H5T_STRING {
              STRSIZE 7;
              STRPAD H5T_STR_NULLTERM;
              CSET H5T_CSET_ASCII;
              CTYPE H5T_C_S1;
           }
        DATASPACE  SCALAR
     }
     GROUP "data" {
        ATTRIBUTE "NX_class" {
           DATATYPE  H5T_STRING {
                 STRSIZE 6;
                 STRPAD H5T_STR_NULLTERM;
                 CSET H5T_CSET_ASCII;
                 CTYPE H5T_C_S1;
              }
           DATASPACE  SCALAR
        }
        DATASET "data" {
           DATATYPE  H5T_STD_I32LE
           DATASPACE  SIMPLE { ( 384, 256, 1024 ) / ( 384, 256, 1024 ) }
           STORAGE_LAYOUT {
              CHUNKED ( 384, 256, 1024 )
              SIZE 139221680 (2.892:1 COMPRESSION)
            }
           FILTERS {
              COMPRESSION DEFLATE { LEVEL 6 }
           }
           FILLVALUE {
              FILL_TIME H5D_FILL_TIME_IFSET
              VALUE  0
           }
           ALLOCATION_TIME {
              H5D_ALLOC_TIME_INCR
           }
           ATTRIBUTE "signal" {
              DATATYPE  H5T_STD_I32LE
              DATASPACE  SCALAR
           }
        }
     }

compression type : NX_COMP_LZW
hdf5 version 1.8.3 called by the Nexus library 4.3.0

Are there explanations for such random behaviour? Some solutions?




--

View this message in context:http://hdf-forum.184993.n3.nabble.com/hdf5-compression-problem-tp4025575.htmlSent from the hdf-forum mailing list archive at Nabble.com<http://Nabble.com>.


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected] <mailto:[email protected]>
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] hdf5 compression problem

Reply via email to