Re: [Hdf-forum] Chunk cache size and performance

Neil Fortner Wed, 13 Jan 2010 08:15:13 -0800

Hi all,

I've been following this for a while, and while I still don't have acomplete picture of what the benchmark is doing, I have an idea thatmight shed some light on this.

My understanding is that the chunk cache is set large enough to hold onevector of chunks but not one plane of chunks, correct? If the vectorsare iterated over in the obvious way (i.e. one plane at a time) theneach vector of chunks in the chunk cache will have to be thrown awayonce for every plane that passes through the chunks, obviously hurtingperformance. In this case, it would be better to disable the chunkcache entirely (set nbytes or nslots to 0), hence why Francesc seesbetter performance with the smaller chunk cache. Ideally, you wouldeither use a chunk cache large enough for a plane of chunks, or iterateover the entire vector of chunks before moving on to the next vector ofchunks in the plane.


Does this make sense?

Here are some (2-D) diagrams to help illustrate what I'm talking about:

In cache
   |
  V
+----+----+----+
|X   |    |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

+----+----+----+
|.X  |    |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

+----+----+----+
|..X |    |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

+----+----+----+
|...X|    |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

Evicted
  |   In cache
 V       V
+----+----+----+
|....|X   |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

+----+----+----+
|....|.X  |    |
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

...
             In cache
                 V
+----+----+----+
|....|....|...X|
|    |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

Read back into cache from disk for the second time
   |         Evicted
   V            V
+----+----+----+
|....|....|....|
|X   |    |    |
+----+----+----+
|    |    |    |
|    |    |    |
+----+----+----+

...

-Neil



On 01/12/2010 09:10 AM, Ger van Diepen wrote:

Hi Quincey,

It uses fixed dimensions.
It iterates over vectors in the x, y or z direction. In the test the z
axis was quite short, so it results in many small hyperslab accesses.
Of course, I could access a plane at a time and get vectors myself, but
I would think that is just what the cache should do for me. Besides, I
do not always know if a plane fits in memory.

I can imagine that doing so many btree lookups is quite expensive. Do
you think that is the main performance bottleneck?
But why so many btree lookups?
Because the chunk and array size are fixed, it can derive the chunks to
use from the hyperslab coordinates. Then it can first look if they are
in the cache which is (I assume) much faster than looking in the chunk
Btree.
Does the experimental branch improve in this way?


Not that Francesc's original question is still open. Why does
performance degrade when using a larger chunk cache?

Cheers,
Ger

On 1/12/2010 at  3:27 PM, in message

<f2abecc0-20f4-4a1e-b178-bde7baee7...@hdfgroup.org>, Quincey Koziol
<koz...@hdfgroup.org>  wrote:

Hi Francesc,

On Jan 8, 2010, at 11:45 AM, Francesc Alted wrote:

Hi Ger,

A Friday 08 January 2010 08:25:09 escriguéreu:

Hi Francesc,

This might be related to a problem I reported last June.
I did tests using a 3-dim array with various chunk shapes and

access

patterns. It got very slow when iterating through the data by

vector in

the Z-direction. I believe it was filed as a bug by the HDF5 group.

sent a test program to Quincey that shows the behaviour. I'll

forward

that mail and the test program to you, so you can try it out

yourself if

you like to.

I suspect the cache lookup algorithm to be the culprit. The larger

the

cache and the more often it has to look up, the slower things get.

BTW,

Did you adapt the cache's hash size to the number of slots in the

cache?

Thanks for your suggestion.  I've been looking at your problem, and

my

profiles seem to say that it is not a cache issue.

Have a look at the attached screenshots showing profiles for your

test bed

reading in the x axis with a cache size of 4 KB (the HDF5 cache

subsystem

does

not enters in action at all) and 256 KB (your size).  I've also

added a

profile for the tiled case for comparison purposes.  For all the

profiles

(except tiled), the bottleneck is clearly in the `H5FL_reg_free` and

`H5FL_reg_malloc` calls, no matter how large the cache size is (even

if it

does not enters in action).

I think this is expected, because HDF5 has to reserve space for each

I/O

operation.  When you walk the dataset following directions x or y,

you have

to

do (32*2)x more I/O operations than for the tiled case, and HDF5

needs to

book

(and free again!) (32*2)x more memory areas.  Also, when you read

through

the

z axis, the additional times to book/release memory is (32*32)x.

All of

this

is consistent with both profiles and running the benchmark

manually:

fal...@antec:/tmp>  time ./tHDF5 1024 1024 10 32 32 2 t
real    0m0.057s
user    0m0.048s
sys     0m0.004s

fal...@antec:/tmp>  time ./tHDF5 1024 1024 10 32 32 2 x
setting cache to 32 chunks (4096 bytes) with 3203 slots  // forcing

no cache

real    0m1.055s
user    0m0.860s
sys     0m0.168s

fal...@antec:/tmp>  time ./tHDF5 1024 1024 10 32 32 2 y
setting cache to 32 chunks (262144 bytes) with 3203 slots
real    0m1.211s
user    0m1.176s
sys     0m0.028s

fal...@antec:/tmp>  time ./tHDF5 1024 1024 10 32 32 2 z
setting cache to 5 chunks (40960 bytes) with 503 slots
real    0m14.813s
user    0m14.777s
sys     0m0.024s

So, in my opinion, there is little that HDF5 can do here.  You

should better

adapt the chunk shape to your most used case (if you have just one,

but I

know

  that this is not typically the case).

        Interesting results.  I'll try to comment on a number of

fronts:

                - The H5FL_* routines are an internal "free list" that

the HDF5 library uses,

in order to avoid calling malloc() as frequently.  You can disable

them for

your benchmarking by configuring the library with the
"--enable-using-memchecker".

                - It looks like you are using the hyperslab routines

quite a bit, are you

creating complicated hyperslabs?

                - The B-tree comparison routine is being used to locate

the correct chunk to

perform I/O on, which may or may not be in the cache.  Do your

datasets have

fixed or unlimited dimensions?  If they are fixed (or only have 1

unlimited

dimension), I can point you at an experimental branch with the new

chunk

indexing features and we can see if that would help your

performance.

        Quincey

In your tests you only mention the chunk size, but not the chunk

shape.

Isn't that important? It gives me the impression that in your tests

the

data are stored and accessed fully sequentially which makes the

cache

useless.

Yes, chunk shape is important, sorry, I forgot this important

detail.  As I

mentioned in a previous message to Rob Latham, I want to optimize

'semi-

random' access mode in a certain row of the dataset, so I normally

choose

the

chunk shape as (1, X), where X is the needed value for obtaining a

chunksize

between 1 KB and 8 MB --if X is larger than the maximum number of

columns, I

expand the number of rows in the chunk shape accordingly.

Thanks,

--
Francesc Alted

<tHDF5-tile.png><tHDF5-x-4KB.png><tHDF5-x-256KB.png>____________________________________

___________

Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org



_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Chunk cache size and performance

Reply via email to