[Hdf-forum] Performance hints for large dataset

Martin Sarajærvi Thu, 12 Jun 2014 13:11:14 -0700

Hi all,

I'm working with floating point data building up a very large dataset
typically >100Gb of four dimensions (x, y, z, w).
Dimensions are of the size (x,y,z,w) = (601, 482, 61, 1501) in my example.


The aim is to slice (READING ONLY) this dataset in orthogonal directions:
1) (x, *, *, *)
2) (*, y, *, *)
3) (*, *, z, w)

When using a contiguous layout I naturally get good performance for
directions (1) and (2), however it is very poor for (3).
Using a chunking layout of (8,8,8,8) seem to give the best balance so far
for reasonable access times in all directions. but still not as fast as I
was hoping for. My tests also show that compression improves the read
performance slightly.

I'm looking for advise on possible optimization techniques to use for this
problem other than what has been mentioned.
Otherwise, is my only option to move to some (expensive?) parallel solution?

Thanks!

Regards,
Martin

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

[Hdf-forum] Performance hints for large dataset

Reply via email to