Finally developed things to a point that I can get useful performance
numbers for my application. So far, things look good. But, when I look
at the performance numbers I see behavior I don't expect -- namely,
that my write throughput is almost 2x greater than my read throughput.
My system: x64 windows xp, ntfs file system, C++, HDF5 compiled w/ VS
2008 (thread-safe).
My data: Dummy GIS data scattered in a region. We have a known grid of
geocells that data should be split into, and store the data into some
number of HDF5 files such that a given file contains data for
neighboring geocells. I take the GIS data, clip it to lat/lon
boundaries (not really lat/lon, but it's sort of equivalent to
lat/lon), determine which HDF5 file the clipped region should be
stored within, and write each clipped GIS dataset in a separate HDF5
dataset (dataset is converted to a 1D stream of 32-bit integer
opcodes/data).
I've created a thin wrapper around HDF5 that retains an LRU cache of
recently opened HDF5 files and datasets. It also hides the details of
our HDF5 file hierarchy and the configuration details of our datasets
from its client applications.
Everything appears to be working great and I've been doing some
performance testing to determine the effects of
compression/chunksize/contiguous-vs-chunked/etc.
The attached images are the results of running some performance tests
to look at read/write throughput versus chunksize. At each chunk size,
I re-ran the test 8 times, throwing out the min/max. Each node in the
graph is the mean of the remaining 6 runs, the error bars represent
the stddev.
The test data was 2million randomly generated GIS points, split into a
few hundred HDF5 datasets in about 25 HDF5 files.
None - chunked datasets w/out compression
NoneNoChunk - contiguous datasets
lzf - chunked w/ lzf compression
zlib1, 4, 9 - zlib at diff levels
The compression ratio shows what I expect. LZF isn't as good as ZLIB
at compression. Minimal compression difference at the various zlib
levels.
Not shown here are the runs I did with the shuffle filter, which for
my data didn't help compression and just slowed things down. The
compression ratio for NoneNoChunk threw me for a bit until I realized
I was seeing the increased file size due to the file space allocated
for partially-used chunks.
The write throughput graph shows LZF considerably better for my data
than the other options for every chunk size. And zlib's mb/sec
throughput is significantly worse, and worse than contiguous or
no-compression.
The read graph shows better for zlib -- it outperforms the
no-compression options. But, again LZF has better throughput than zlib.
So, I confirmed what I had expected performance-wise. But, then I
looked both read & write graphs.
On read throughput, my datasets w/ LZF average 70-80 MB/sec.
But, on write throughput, my datasets w/ LZF average 125 MB/sec.
It doesn't just seem to be related to a compression filter. The write
throughput for my contiguous dataset runs (NoneNoChunk) was ~60
MB/sec, and its read throughput was ~45 MB/sec.
Unfortunately, I cannot share my code. Any ideas where to look for
what might be causing this? Or, any hints for how to diagnose these
differences myself?
Writing all this down, I'm starting to wonder if comparing my
read/write throughput is a valid comparison at all. The way my
performance testing application is writing out data is different than
read.
In both, I read/write the same total amount of data and traverse the
same datasets. However, the order I do that dataset traversal is
different.
My geocell datasets end up similar a 2D array. In the table below each
2digit number represents a dataset. The spacing represents how those
datasets are stored in separate HDF5 files -- e.g. datasets 00-03,
10-13, 20-23, 30,33 are stored in a single file.
00 01 02 03 04 05 06 07 08 09
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
In my read test, I do a row-major traversal of the datasets (00-09,
10-19, 20-29, etc). In the write tests, that's not the case -- every
dataset is held in a hash map before being written to disk.
Maybe the unexpected throughput behavior is due to my wrapper library
that implements the LRU cache of files. The file-handle cache is small
(<5), and depending on the length of the row by the time the read test
reaches the end of the row and moves to the next dataset in the first
file, that first file may have fallen out of cache.
Will have to fiddle with my cache configuration and see if that
eliminates this behavior.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org