Vikram,

1. That depends very much on whether you use compression. If you don't use 
compression, then it may be faster to disable the chunk cache and use something 
like mx1x1 chunks or mx1xp chunks (or some value between 1 and p for the third 
dimension). This will cause reads in the first case to read n single elements 
(not ideal, but at least not bandwidth intensive), and in the second case to 
read between 1 and p whole chunks. If you use the chunk cache with this scheme, 
at it is set large enough to hold all chunks for a single component, it will 
greatly improve the first case when the time value changes, but greatly 
increase bandwidth when the component changes, due to needing to read in the 
whole chunks, as opposed to single elements without the cache.


If you are using compression, then you most likely want to use the chunk cache 
(at least it won't hurt). I would think you'd want  more squarish chunks here, 
with the size determined by whether you prioritize bandwidth (smaller chunks) 
or latency (larger chunks), and the shape determined by how much you prioritize 
one case over the other (it should "flatten" to resemble the read pattern you 
are prioritizing, flatten more to prioritize more at the expense of the other 
pattern). If you can set the chunk cache large enough to hold all chunks 
involved in an operation (or more than one operation), then that will greatly 
improve performance when subsequent reads touch the same chunks.


2. As above, the optimal chunk cache setting may be to disable it, if no 
compression is used. If it is not disabled, then generally the larger the 
better, though there is a point of diminishing returns. It should ideally be at 
least as large as all chunks involved in an operation.


3. That slide only referred to the fact that the chunk cache size had no effect 
on the results of that specific test (unless the cache were set large enough to 
hold the entire dataset). I agree those slides by themselves don't do a good 
job of explaining exactly what's going on. The chunk cache size definitely can 
affect performance with compression.


4. I'm not sure what you mean by this. Individual chunks are always contiguous 
both in memory and on disk. Do you mean placing all the chunks next to each 
other?


Thanks,

-Neil


________________________________
From: Hdf-forum <[email protected]> on behalf of 
Bhamidipati, Vikram <[email protected]>
Sent: Friday, April 1, 2016 8:55 PM
To: [email protected]
Subject: [Hdf-forum] Chunking size and compression for large datasets


Hello,



I have been reading some posts in this forum about chunking in HDF5 and ways of 
optimizing read on large datasets. The emphasis is on read time optimization 
rather than write time optimization because of the way the code works. I have a 
large 3D dataset of native double datatype. The dimensions are m by n by p (in 
row major storage) where m is the time dimension size, n is the grid point 
index size and p is the field quantity component size (always fixed).



The first access use case is as follows:

1.       The graphics visualization requests data for the whole grid at a given 
time index and component index. It would seem like the best case here would be 
to read a single hyperslab with stride length equal to size of field components 
in 3rd dimension and equal to 1 in grid point index.  Count is 1 along time 
index and field quantity index.

2.       When time index changes, the start value for 1st dimension (time 
index) is changed while all other parameters stay constant.

3.       When field component changes the start value for 3rd dimension (field 
component index) is changed.



The second use case is as follows:

1.       The algorithm chooses certain grid points of interest (not necessarily 
adjacent in memory) and the code requests a “time history” for those grid 
points which include all field components for each node. So the request is for 
data over all time indices and all components over a small subset (negligible 
fraction of total) of grid points. The best case here would seem to be to read 
a union of hyperslabs where union is over the 2nd dimension indices.

2.       Step 1 is repeated for various locations in the grid many more times. 
Since multiple threads are running and making these requests independently, 
there is no possibility of further union of hyperslabs.



It would seem like the two read access patterns have somewhat conflicting needs 
since first access pattern has constant 1st dimension whereas second access 
pattern has constant 2nd dimension for each hyperslab. If pushed to make a 
choice I would optimize second read pattern over first since it critically 
affects execution time. I am also intending on using  ‘H5S_SELECT_OR’ selection 
operator for union of hyperslabs before h5dread call. As previously mentioned, 
the write time is not very critical but read access time is. So my questions 
are:



1.       What chunking sizes would work best? I am planning on using m by 1 by 
p chunk size when writing the dataset provided m is large enough to push the 
chunk size over 1 Mb. If smaller I would increase 2nd dimension size. Is this 
the right strategy?

2.       Can I set cache size for the dataset to optimize read time? I read 
about using h5pset_chunk_cache. Since I know how many bytes each chunk I am 
going to request is, should I set the cache size to number of grid points times 
data size for each grid point? Also is this function needed only during read 
(since write time optimization is not an issue)?

3.       What compression method if any should be used? I read in a tutorial on 
chunking 
(https://www.hdfgroup.org/HDF5/doc/Advanced/Chunking/Chunking_Tutorial_EOS13_2009.pdf)
 that if compression is used it does not matter what cache size is. Is this 
correct? Why? I did not understand the explanation in the tutorial that since 
entire chunk is always read (when compression is used) for each h5dread call, 
cache size does not matter? Any clarification on how compression helps optimize 
read time will be helpful.

4.       Also while writing the dataset can I force chunks to be contiguous in 
memory to reduce any seek times?



Thank you,

Vikram
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to