Re: [Hdf-forum] Writing Performance

Mike Davis Mon, 15 Aug 2011 05:49:17 -0700

On Fri, Aug 5, 2011 at 11:09 PM, Charles Darwin <trok...@yahoo.com> wrote:
> Hi,


Hi Charles,

> 1. Is this the expected performance or am I possibly doing something wrong?

Expected performance is tricky to answer, given how many variables are
involved. A good way to check the upper level of performance you can
expect from hdf5 on your machine is to use h5perf_serial.

> 2. If I understand correctly, even though I'm writing 1 row at a time, the
> data isn't actually being written to the disk until the chunk is evicted
> from the cache and it is only at that point that the entire chunk gets
> written to the disk (until then it's only writing to the chunk in the
> cache), if this is true then I would expect the performance to be similar to
> writing bulks of 5500 x 28 (chunk size * compound data type size) = 154,000
> bytes to the HDD which I would expect to perform at least 5x better.
>
> Is my understanding correct?
>
> Does writing 1 record at a time cause overhead? If it does where is the
> overhead coming from?

All of the operations involved in writing data to a dataset have a
non-zero cost. resizing the dataset, allocating the read/write
dataspaces and performing the write all take some time, time that
you've now brought into your innermost loop.

A general strategy for investigating possible optimizations that
should serve you well continuing with HDF5 is to try several
configurations and compare their performance. That said, I would
definitely investigate writing more than one row to the dataset at a
time.

--
Mike Davis
mikeda...@uchicago.edu

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@hdfgroup.org
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] Writing Performance

Reply via email to