Hi Ger, Thanks for your reply.
I have played a bit with the cache size, but not tried your particular suggestion. Sure I could optimize the chunk sizes for each of the slicing directions, but this does not really solve my problem as I can only have 1 chunk setting for my dataset or am I missing something? So if I optimize for (*,y,*,*) including adjusting the cache setting the (*, *, z, w) slicing would still be slow. I would be interested in checking your program for chunk/cache size testing (running Linux here). Best regards, Martub On Mon, Jun 16, 2014 at 8:26 AM, Ger van Diepen <[email protected]> wrote: > Hi Martin, > > Have you set the chunk cache sufficiently large? Otherwise it will > reread the same chunks again and again. Allthough the system file cache > might hold all those data, I think it's better to size the cache correctly > because of the lookups HDF5 is doing. > > E.g. in the case of (*,y,*,*) you'll need a cache of 601*8*61*1501 floats > (1.64 GB). I assume have sufficient memory, otherwise you could adjust the > chunk size, especially in z,w. > > Your chunks are not particularly large (16384 bytes) leading to a lot of > iops and a large B-tree to index the chunks. On the other hand, when > enlarging the chunks, you''ll need more memory for the chunk cache. > > What is the pattern when accessing the data as *,*,z,w? First w, and > thereafter all z? You'll need a much smaller cache when accessing it like > > for w in 0:nw/ncw (nw is length of w-axis; ncw is chunk-size in w) > > for z in 0:nz/ncz > > for w1 in 0:ncw > > for z1 in 0:ncz > > In this way you handle a full z,w chunk before moving to the next one, so > your cache size needs to be only 601*482*8*8. > > I have a program testing 3D data sets of arbitrary size and chunk size > using a cache size depending on the chunk size and access pattern. If you > like to, I can send it. > > Cheers, > > Ger > > >>> Matthieu Brucher <[email protected]> 6/12/2014 10:56 PM >>> > > Hi, > > Unfortunately, this is indeed the worst you can have. It's completely > normal that you have the worst performance with slicing in these > dimensions. Even with a parallel filesystem, you would need to read > EVERYTHING from the dataset, and then the library would pick up the > pieces you need. > One solution would be to agglomerate several z,w in dimensions 5 and > 6, so that you still get some performance, but it will be worse than 1 > or even 2. > > Cheers, > > Matthieu > > > 2014-06-12 20:43 GMT+01:00 Martin Sarajærvi <[email protected]>: > > Hi all, > > > > I'm working with floating point data building up a very large dataset > > typically >100Gb of four dimensions (x, y, z, w). > > Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my > example. > > > > The aim is to slice (READING ONLY) this dataset in orthogonal directions: > > 1) (x, *, *, *) > > 2) (*, y, *, *) > > 3) (*, *, z, w) > > > > When using a contiguous layout I naturally get good performance for > > directions (1) and (2), however it is very poor for (3). > > Using a chunking layout of (8,8,8,8) seem to give the best balance so far > > for reasonable access times in all directions. but still not as fast as I > > was hoping for. My tests also show that compression improves the read > > performance slightly. > > > > I'm looking for advise on possible optimization techniques to use for > this > > problem other than what has been mentioned. > > Otherwise, is my only option to move to some (expensive?) parallel > solution? > > > > Thanks! > > > > Regards, > > Martin > > > > _______________________________________________ > > Hdf-forum is for HDF software users discussion. > > [email protected] > > > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > > Twitter: https://twitter.com/hdf5 > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > > http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
