Please try the h5stat tool to see why there is a difference in file sizes. Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
On Sep 10, 2013, at 5:34 PM, "Steinberg, Peter" <[email protected]> wrote: > Original file: 4,411,531,805 bytes > Repacked file: 4,286,032,869 bytes > > The dataset is 800 x 800 x 1796 floats (H5T_IEEE_F32LE). > > The application writes out chunks of size 1 x 1 x 1796 as the data is > acquired. > > The data is read back in chunks of 1 x 800 x 1796 (earlier testing showed > this gave the best performance while still allowing me to give useful read > progress updates). > > I had tried writing out the data in chunk of size 1 x 25 x 1796 (and various > other values for the second index) but that was slower as I try to flush the > data to disk after each 1 x 1 x 1796 data block to minimize data loss in case > of hardware / software issues. > > I can try rebuilding the HDF5 library and look for some way to profile it. > > Peter > > From: Hdf-forum [mailto:[email protected]] On Behalf Of > Elena Pourmal > Sent: Tuesday, September 10, 2013 5:24 PM > To: HDF Users Discussion List > Subject: Re: [Hdf-forum] hdf5 file organization questions? > > Hi Peter, > > My apology. I think I misunderstood your question. > > h5repack does rearrange data in hdf5 file. For example, some pieces of HDF5 > metadata that were "scattered" in the original file are in one block in the > repacked file; chunks are allocated "one after another" and that may help > with the disk access. But it is hard to say where the gain is without > profiling the application with original and then repacked file. > > Is the size of repacked file much smaller? > Would it be possible to describe how the file was written? How does the > application access the file? > > If you are not changing layout and compression parameters with h5repack, it > is really surprising that h5repack helps so much. It is good to know that > this may be an option :-) > > Elena > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Elena Pourmal The HDF Group http://hdfgroup.org > 1800 So. Oak St., Suite 203, Champaign IL 61820 > 217.531.6112 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > On Sep 9, 2013, at 8:55 AM, "Steinberg, Peter" > <[email protected]> wrote: > > > I’m not comparing directly with h5repack but with my application reading the > dataset before and after running h5repack. > > The quick profiling I’ve done showed the big time use in the H5Dread calls; I > didn’t profile the internals of the HDF5 library. > > The dataset (both before and after repacking) is H5T_IEEE_F32LE, 3 > dimensional, 800 x 800 x 1796, with a chunk size of 1 x 1 x 1796 and > compressed at deflate level 6. > > Also, running a simple h5repack on the output from the first h5repack shows a > similar speed increase (h5repack outfile outfile2). > > Thanks, > Peter > > From: Hdf-forum [mailto:[email protected]] On Behalf Of > Elena Pourmal > Sent: Sunday, September 08, 2013 5:39 PM > To: HDF Users Discussion List > Subject: Re: [Hdf-forum] hdf5 file organization questions? > > Peter, > > There are a few optimizations that h5repack does. For example, when rewriting > chunked data h5repack uses H5Ocopy if applied filters and chunk sizes stay > the same. It also uses hyperslab selections to coincide with the chunk > boundaries and avoids datatype conversion if possible. > > The comparison with h5repack may not be fair. For example, when an > application reads compressed data time will be spend in decoding, while > h5repack avoids the decoding step completely. Have you profiled your > application to see where the time is spent? > > Elena > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Elena Pourmal The HDF Group http://hdfgroup.org > 1800 So. Oak St., Suite 203, Champaign IL 61820 > 217.531.6112 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > > > On Sep 6, 2013, at 2:28 PM, Steinberg, Peter wrote: > > > > My applications save a good-sized hdf5 dataset. > > It takes a fairly long time to read back in (around 4 minutes). > > If I do a simple repack on the file (h5repack infile outfile) reading it back > in only takes around 1 minute. > > What’s h5repack doing that speeds up the reads so much, and how do I > implement that in my application? > > (I was using h5repack to test different chunking sizes, and everything I did > in h5repack gave a similar time, including repacking to the same chunking > scheme as the original dataset). > > Thanks, > Peter Steinberg > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
