Hi!

On Fr, 2010-04-09 at 09:14 -0500, Peter Cao wrote: 
> Hi Johannes,
> 
> 22s is about right. Making your chunk size to be about 64kB or 1MB
> will improve the compression ratio and I/O performance. You can try
> different chunk size to get the best result. Compression level 6 should be
> a good choice for i/o performance and compression ratio.

I did some detailed performance tests this morning (I try to attach the
spreadsheet - I don't know if this forum allows this).

For my testdata (~1.100.000 varying length strings, summerized of size
488MB) I found

i) it *very* supprising to see the performance results' variance: on my
(else idling) development machine best to worst measured runs always
differed by 25-40%! No explanation of this is coming to my mind up to
now ...
This means that differences below 5% maybe only caused by random (I ran
each configuration 6-11 times - but this seems not enough to me
regarding the variance).

ii) one cannot talk of any compression: difference from level -1/0 to
level 9 is just 1,39% percent in the resulting hdf file's size
(~970MB) :-(

iii) "compression" level 5 seems best choice taking into account
additionally performance (4% overhead compared to 10% using level 9).
But regard i) reading this - maybe only a random ...

iv) my strings are much shorter than yours. With mine I observe that it
is best to write ~350 of mine in a block with a chunksize of 16K. The
number of strings writing in a block makes the biggest difference: 1 =>
425s, 10 => 55s, 100 => 21s, 350 => 20s, 600 => 22s, 1000 => 23s.

v) with always writing 100 strings in a block, the chunksize makes a
difference of max. 10% (tested with 128 bytes up to 64K). But with
chunksize of 128K perfromance degraded by factor 10 to 682s for a single
run.



Next I will try to use an array type of fixed length to see some working
compression.


Best regards,
Johannes Stamminger

Attachment: HDF Packet String Writing Performance Comparison.xls
Description: MS-Excel spreadsheet

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to