Hi!
On Fr, 2010-04-09 at 09:14 -0500, Peter Cao wrote: > Hi Johannes, > > 22s is about right. Making your chunk size to be about 64kB or 1MB > will improve the compression ratio and I/O performance. You can try > different chunk size to get the best result. Compression level 6 should be > a good choice for i/o performance and compression ratio. I did some detailed performance tests this morning (I try to attach the spreadsheet - I don't know if this forum allows this). For my testdata (~1.100.000 varying length strings, summerized of size 488MB) I found i) it *very* supprising to see the performance results' variance: on my (else idling) development machine best to worst measured runs always differed by 25-40%! No explanation of this is coming to my mind up to now ... This means that differences below 5% maybe only caused by random (I ran each configuration 6-11 times - but this seems not enough to me regarding the variance). ii) one cannot talk of any compression: difference from level -1/0 to level 9 is just 1,39% percent in the resulting hdf file's size (~970MB) :-( iii) "compression" level 5 seems best choice taking into account additionally performance (4% overhead compared to 10% using level 9). But regard i) reading this - maybe only a random ... iv) my strings are much shorter than yours. With mine I observe that it is best to write ~350 of mine in a block with a chunksize of 16K. The number of strings writing in a block makes the biggest difference: 1 => 425s, 10 => 55s, 100 => 21s, 350 => 20s, 600 => 22s, 1000 => 23s. v) with always writing 100 strings in a block, the chunksize makes a difference of max. 10% (tested with 128 bytes up to 64K). But with chunksize of 128K perfromance degraded by factor 10 to 682s for a single run. Next I will try to use an array type of fixed length to see some working compression. Best regards, Johannes Stamminger
HDF Packet String Writing Performance Comparison.xls
Description: MS-Excel spreadsheet
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
