Hi Quincey, It uses fixed dimensions. It iterates over vectors in the x, y or z direction. In the test the z axis was quite short, so it results in many small hyperslab accesses. Of course, I could access a plane at a time and get vectors myself, but I would think that is just what the cache should do for me. Besides, I do not always know if a plane fits in memory.
I can imagine that doing so many btree lookups is quite expensive. Do you think that is the main performance bottleneck? But why so many btree lookups? Because the chunk and array size are fixed, it can derive the chunks to use from the hyperslab coordinates. Then it can first look if they are in the cache which is (I assume) much faster than looking in the chunk Btree. Does the experimental branch improve in this way? Not that Francesc's original question is still open. Why does performance degrade when using a larger chunk cache? Cheers, Ger >>> On 1/12/2010 at 3:27 PM, in message <f2abecc0-20f4-4a1e-b178-bde7baee7...@hdfgroup.org>, Quincey Koziol <koz...@hdfgroup.org> wrote: > Hi Francesc, > > On Jan 8, 2010, at 11:45 AM, Francesc Alted wrote: > >> Hi Ger, >> >> A Friday 08 January 2010 08:25:09 escriguéreu: >>> Hi Francesc, >>> >>> This might be related to a problem I reported last June. >>> I did tests using a 3-dim array with various chunk shapes and access >>> patterns. It got very slow when iterating through the data by vector in >>> the Z-direction. I believe it was filed as a bug by the HDF5 group. I >>> sent a test program to Quincey that shows the behaviour. I'll forward >>> that mail and the test program to you, so you can try it out yourself if >>> you like to. >>> >>> I suspect the cache lookup algorithm to be the culprit. The larger the >>> cache and the more often it has to look up, the slower things get. BTW, >>> Did you adapt the cache's hash size to the number of slots in the cache? >> >> Thanks for your suggestion. I've been looking at your problem, and my >> profiles seem to say that it is not a cache issue. >> >> Have a look at the attached screenshots showing profiles for your test bed >> reading in the x axis with a cache size of 4 KB (the HDF5 cache subsystem > does >> not enters in action at all) and 256 KB (your size). I've also added a >> profile for the tiled case for comparison purposes. For all the profiles >> (except tiled), the bottleneck is clearly in the `H5FL_reg_free` and >> `H5FL_reg_malloc` calls, no matter how large the cache size is (even if it >> does not enters in action). >> >> I think this is expected, because HDF5 has to reserve space for each I/O >> operation. When you walk the dataset following directions x or y, you have > to >> do (32*2)x more I/O operations than for the tiled case, and HDF5 needs to > book >> (and free again!) (32*2)x more memory areas. Also, when you read through > the >> z axis, the additional times to book/release memory is (32*32)x. All of > this >> is consistent with both profiles and running the benchmark manually: >> >> fal...@antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 t >> real 0m0.057s >> user 0m0.048s >> sys 0m0.004s >> >> fal...@antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 x >> setting cache to 32 chunks (4096 bytes) with 3203 slots // forcing no cache >> real 0m1.055s >> user 0m0.860s >> sys 0m0.168s >> >> fal...@antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 y >> setting cache to 32 chunks (262144 bytes) with 3203 slots >> real 0m1.211s >> user 0m1.176s >> sys 0m0.028s >> >> fal...@antec:/tmp> time ./tHDF5 1024 1024 10 32 32 2 z >> setting cache to 5 chunks (40960 bytes) with 503 slots >> real 0m14.813s >> user 0m14.777s >> sys 0m0.024s >> >> So, in my opinion, there is little that HDF5 can do here. You should better > >> adapt the chunk shape to your most used case (if you have just one, but I > know >> that this is not typically the case). > > Interesting results. I'll try to comment on a number of fronts: > - The H5FL_* routines are an internal "free list" that the HDF5 library uses, > in order to avoid calling malloc() as frequently. You can disable them for > your benchmarking by configuring the library with the > "--enable-using-memchecker". > > - It looks like you are using the hyperslab routines quite a bit, are you > creating complicated hyperslabs? > > - The B-tree comparison routine is being used to locate the correct chunk to > perform I/O on, which may or may not be in the cache. Do your datasets have > fixed or unlimited dimensions? If they are fixed (or only have 1 unlimited > dimension), I can point you at an experimental branch with the new chunk > indexing features and we can see if that would help your performance. > > Quincey > >>> In your tests you only mention the chunk size, but not the chunk shape. >>> Isn't that important? It gives me the impression that in your tests the >>> data are stored and accessed fully sequentially which makes the cache >>> useless. >> >> Yes, chunk shape is important, sorry, I forgot this important detail. As I >> mentioned in a previous message to Rob Latham, I want to optimize 'semi- >> random' access mode in a certain row of the dataset, so I normally choose > the >> chunk shape as (1, X), where X is the needed value for obtaining a chunksize > >> between 1 KB and 8 MB --if X is larger than the maximum number of columns, I >> expand the number of rows in the chunk shape accordingly. >> >> Thanks, >> >> -- >> Francesc Alted >> > <tHDF5-tile.png><tHDF5-x-4KB.png><tHDF5-x-256KB.png>____________________________________ > ___________ >> Hdf-forum is for HDF software users discussion. >> Hdf-forum@hdfgroup.org >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > Hdf-forum@hdfgroup.org > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@hdfgroup.org http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org