Re: [Hdf-forum] hdf5 file organization questions?

Elena Pourmal Wed, 11 Sep 2013 06:30:44 -0700

Please try the h5stat tool to see why there is a difference in file sizes.

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org   
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





On Sep 10, 2013, at 5:34 PM, "Steinberg, Peter" 
<[email protected]> wrote:

> Original file:        4,411,531,805 bytes
> Repacked file:   4,286,032,869 bytes
>  
> The dataset is 800 x 800 x 1796 floats (H5T_IEEE_F32LE).
>  
> The application writes out chunks of size 1 x 1 x 1796 as the data is 
> acquired.
>  
> The data is read back in chunks of 1 x 800 x 1796 (earlier testing showed 
> this gave the best performance while still allowing me to give useful read 
> progress updates).
>  
> I had tried writing out the data in chunk of size 1 x 25 x 1796 (and various 
> other values for the second index) but that was slower as I try to flush the 
> data to disk after each 1 x 1 x 1796 data block to minimize data loss in case 
> of hardware / software issues.
>  
> I can try rebuilding the HDF5 library and look for some way to profile it.
>  
> Peter
>  
> From: Hdf-forum [mailto:[email protected]] On Behalf Of 
> Elena Pourmal
> Sent: Tuesday, September 10, 2013 5:24 PM
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] hdf5 file organization questions?
>  
> Hi Peter,
>  
> My apology. I think I misunderstood your question.
>  
> h5repack does rearrange data in hdf5 file. For example, some pieces of HDF5 
> metadata that were "scattered" in the original file are in one block in the 
> repacked file; chunks are allocated "one after another" and that may help 
> with the disk access. But it is hard to say where the gain is without 
> profiling the application with original and then repacked file. 
>  
> Is the size of repacked file much smaller?
> Would it be possible to describe how the file was written? How does the 
> application access the file?  
>  
> If you are not changing layout and compression parameters with h5repack, it 
> is really surprising that h5repack helps so much. It is good to know that 
> this may be an option :-)
>  
> Elena
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Elena Pourmal  The HDF Group  http://hdfgroup.org   
> 1800 So. Oak St., Suite 203, Champaign IL 61820
> 217.531.6112
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> 
>  
> On Sep 9, 2013, at 8:55 AM, "Steinberg, Peter" 
> <[email protected]> wrote:
> 
> 
> I’m not comparing directly with h5repack but with my application reading the 
> dataset before and after running h5repack.
>  
> The quick profiling I’ve done showed the big time use in the H5Dread calls; I 
> didn’t profile the internals of the HDF5 library.
>  
> The dataset (both before and after repacking) is H5T_IEEE_F32LE, 3 
> dimensional, 800 x 800 x 1796, with a chunk size of 1 x 1 x 1796 and 
> compressed at deflate level 6.
>  
> Also, running a simple h5repack on the output from the first h5repack shows a 
> similar speed increase (h5repack outfile outfile2).
>  
> Thanks,
>   Peter
>  
> From: Hdf-forum [mailto:[email protected]] On Behalf Of 
> Elena Pourmal
> Sent: Sunday, September 08, 2013 5:39 PM
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] hdf5 file organization questions?
>  
> Peter,
>  
> There are a few optimizations that h5repack does. For example, when rewriting 
> chunked data h5repack uses H5Ocopy if applied filters and chunk sizes stay 
> the same. It also uses hyperslab selections to coincide with the chunk 
> boundaries and avoids datatype conversion if possible. 
>  
> The comparison with h5repack may not be fair. For example, when an 
> application reads compressed data time will be spend in decoding, while 
> h5repack avoids the decoding step completely. Have you profiled your 
> application to see where the time is spent?
>  
> Elena
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Elena Pourmal  The HDF Group  http://hdfgroup.org   
> 1800 So. Oak St., Suite 203, Champaign IL 61820
> 217.531.6112
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> 
> 
>  
> On Sep 6, 2013, at 2:28 PM, Steinberg, Peter wrote:
> 
> 
> 
> My applications save a good-sized hdf5 dataset.
>  
> It takes a fairly long time to read back in (around 4 minutes).
>  
> If I do a simple repack on the file (h5repack infile outfile) reading it back 
> in only takes around 1 minute.
>  
> What’s h5repack doing  that speeds up the reads so much, and how do I 
> implement that in my application?
>  
> (I was using h5repack to test different chunking sizes, and everything I did 
> in h5repack gave a similar time, including repacking to the same chunking 
> scheme as the original dataset).
>  
> Thanks,
>   Peter Steinberg
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>  
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>  
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Re: [Hdf-forum] hdf5 file organization questions?

Reply via email to