Re: [Hdf-forum] H5Dread and array organization

Quincey Koziol Tue, 15 May 2012 11:27:07 -0700

Hi Ger!

On May 15, 2012, at 8:04 AM, Ger van Diepen wrote:


> Hi Gerd,
> 
> Thans for the link to the document describing how chunked data are read. It 
> gives some good insights, but leaves me with a few questions.
> 
> 1. I cannot imagine that the first step is reading a chunk from disk. Doesn't 
> it look in the chunk cache first? If not, what is the purpose of the chunk 
> cache?

        Yes, it does look in the chunk cache first.

> 2. I would like to know in more detail what reading the chunk means. I assume 
> it is doing a B-tree lookup to find out where the chunk is located. What is 
> involved in that step?

        Yes, if the chunk isn't in the cache, the library does an index lookup 
on the on the coordinates of the chunk, finding the address of the chunk in the 
file.  (I say "index lookup" because, although we use a B-tree currently, we 
are moving to using more types of indices in the next major release (1.10.0), 
which will give a constant time lookup in many cases)  Once the address of the 
chunk is found, the chunk is brought into the cache (usually) and I/O is 
performed on it.

> 3. The diagram does not tell me why reading many small hyperslabs is so much 
> slower than reading a large hyperslab. Can it be that the B-tree lookup is 
> done over and over again, even if the chunk is in the cache?

        No, it's just slower due to some inefficiencies in the hyperslabbing 
code.  I made some progress speeding this up after we talked 2 years ago, but 
didn't have the time to finish the job.  It's probably only a few weeks of work 
to knock out the remaining slowness...

                Quincey

> Cheers,
> Ger
> 
> >>> "Gerd Heber" <ghe...@hdfgroup.org> 5/15/2012 2:25 PM >>>
> Mathieu, you should bear in mind that reading a dataset is logically a
> mapping between dataspaces. The underlying physical layout in the file is
> irrelevant for this mapping. Users may not appreciate getting different
> answers when reading the nominally same datset with different physical
> layouts.
> Of course, not all layouts may give you the same performance.
> 
> > If i understand well, a chunked dataset is read chunk by chunk
> 
> That's a misunderstanding. Sometimes that's the case, but not always. Have a
> look at
> 
> http://www.hdfgroup.org/HDF5/doc/Advanced/DataFlow_H5Dread/DataFlow_H5Dread.
> pdf
> 
> Best, G.
> 
> 
> 
> -----Original Message-----
> From: hdf-forum-boun...@hdfgroup.org [mailto:hdf-forum-boun...@hdfgroup.org]
> On Behalf Of mathieu.westp...@obs.ujf-grenoble.fr
> Sent: Tuesday, May 15, 2012 5:23 AM
> To: hdf-forum@hdfgroup.org
> Subject: [Hdf-forum] H5Dread and array organization
> 
> Hello
> 
> I have a chunked dataset of size 20 20 10.
> 
> Chunk size are : 10 10 1.
> 
> le'ts say i read an hyperslab of data defined by:
> start={0 0 0}
> count={4 4 4}
> 
> 
> I read it into a 1-D array.
> 
> and i get
> X Y Z
> 0 0 0
> 0 0 1
> 0 0 2
> 0 0 3
> 0 0 4
> 0 1 0
> 0 1 1
> 0 1 2
> 0 1 3
> 0 1 4
> 0 2 0
> 0 2 1
> ...
> 0 4 3
> 0 4 4
> 1 0 0
> 1 0 1
> ...
> 
> 
> If i understand well, a chunked dataset is read chunk by chunk, so i cannot
> understand how i can obtain this kind of order without reordering
> completelly the data. A unique chunk cannot contain two diferent Z..
> 
> So,
> Is this normal? do HDF5 reorder data (wasting time and ressources )? 
> Is there anyway to control this order? (not row-major, but let's say
> Z-major..)
> 
> Thanks for helping.
> 
> Mathieu
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> Hdf-forum@hdfgroup.org
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] H5Dread and array organization

Reply via email to