Hi Peter,

My apology. I think I misunderstood your question.

h5repack does rearrange data in hdf5 file. For example, some pieces of HDF5 
metadata that were "scattered" in the original file are in one block in the 
repacked file; chunks are allocated "one after another" and that may help with 
the disk access. But it is hard to say where the gain is without profiling the 
application with original and then repacked file. 

Is the size of repacked file much smaller?
Would it be possible to describe how the file was written? How does the 
application access the file?  

If you are not changing layout and compression parameters with h5repack, it is 
really surprising that h5repack helps so much. It is good to know that this may 
be an option :-)

Elena

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org   
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




On Sep 9, 2013, at 8:55 AM, "Steinberg, Peter" 
<[email protected]> wrote:

> I’m not comparing directly with h5repack but with my application reading the 
> dataset before and after running h5repack.
>  
> The quick profiling I’ve done showed the big time use in the H5Dread calls; I 
> didn’t profile the internals of the HDF5 library.
>  
> The dataset (both before and after repacking) is H5T_IEEE_F32LE, 3 
> dimensional, 800 x 800 x 1796, with a chunk size of 1 x 1 x 1796 and 
> compressed at deflate level 6.
>  
> Also, running a simple h5repack on the output from the first h5repack shows a 
> similar speed increase (h5repack outfile outfile2).
>  
> Thanks,
>   Peter
>  
> From: Hdf-forum [mailto:[email protected]] On Behalf Of 
> Elena Pourmal
> Sent: Sunday, September 08, 2013 5:39 PM
> To: HDF Users Discussion List
> Subject: Re: [Hdf-forum] hdf5 file organization questions?
>  
> Peter,
>  
> There are a few optimizations that h5repack does. For example, when rewriting 
> chunked data h5repack uses H5Ocopy if applied filters and chunk sizes stay 
> the same. It also uses hyperslab selections to coincide with the chunk 
> boundaries and avoids datatype conversion if possible. 
>  
> The comparison with h5repack may not be fair. For example, when an 
> application reads compressed data time will be spend in decoding, while 
> h5repack avoids the decoding step completely. Have you profiled your 
> application to see where the time is spent?
>  
> Elena
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Elena Pourmal  The HDF Group  http://hdfgroup.org   
> 1800 So. Oak St., Suite 203, Champaign IL 61820
> 217.531.6112
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> 
>  
> On Sep 6, 2013, at 2:28 PM, Steinberg, Peter wrote:
> 
> 
> My applications save a good-sized hdf5 dataset.
>  
> It takes a fairly long time to read back in (around 4 minutes).
>  
> If I do a simple repack on the file (h5repack infile outfile) reading it back 
> in only takes around 1 minute.
>  
> What’s h5repack doing  that speeds up the reads so much, and how do I 
> implement that in my application?
>  
> (I was using h5repack to test different chunking sizes, and everything I did 
> in h5repack gave a similar time, including repacking to the same chunking 
> scheme as the original dataset).
>  
> Thanks,
>   Peter Steinberg
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
>  
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to