Hi,

I've been looking into HDF5 as possible data storage for an application.

I'd appreciate some assistance with understanding writing performance.

The data that is being written is a compound data type of 28 bytes (one
unsigned 8 byte int, four 4 byte floats and one unsigned 4 byte int; 1 x
STD_U64LE, 4 x IEEE_F32LE, 1 x STD_U32LE).

The chunk size is set to 5500 elements, cache settings are at their default.

The writing is done 1 row at a time (appending to the end, using the dataset
API, not the packet table or the table API) and the performance I get is
around 200,000 rows per second which is below my expectations.

1. Is this the expected performance or am I possibly doing something wrong?

2. If I understand correctly, even though I'm writing 1 row at a time, the
data isn't actually being written to the disk until the chunk is evicted
from the cache and it is only at that point that the entire chunk gets
written to the disk (until then it's only writing to the chunk in the
cache), if this is true then I would expect the performance to be similar to
writing bulks of 5500 x 28 (chunk size * compound data type size) = 154,000
bytes to the HDD which I would expect to perform at least 5x better.

Is my understanding correct? 

Does writing 1 record at a time cause overhead? If it does where is the
overhead coming from?

--
View this message in context: 
http://hdf-forum.184993.n3.nabble.com/Writing-Performance-tp3230432p3230432.html
Sent from the hdf-forum mailing list archive at Nabble.com.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to