A Monday 12 April 2010 13:58:24 Stamminger, Johannes escrigué:
> I did some detailed performance tests this morning (I try to attach the
> spreadsheet - I don't know if this forum allows this).
> 
> For my testdata (~1.100.000 varying length strings, summerized of size
> 488MB) I found
> 
> i) it *very* supprising to see the performance results' variance: on my
> (else idling) development machine best to worst measured runs always
> differed by 25-40%! No explanation of this is coming to my mind up to
> now ...
> This means that differences below 5% maybe only caused by random (I ran
> each configuration 6-11 times - but this seems not enough to me
> regarding the variance).

That could be consequence of the cache disk subsystem of the OS that is 
working on.  If you want to get better reproducibility on your results, try to 
flush the OS cache (sync with UNIX-like OS) before taking time measurements.  
Of course, you may not be interested in measuring your disk I/O, but only the 
disk cache subsystem throughput, but this is always tricky to do.

> 
> ii) one cannot talk of any compression: difference from level -1/0 to
> level 9 is just 1,39% percent in the resulting hdf file's size
> (~970MB) :-(

For what I know, compression of variable length types is not supported by HDF5 
yet.  By forcing the use of a compression filter there, you are only 
compressing the *pointers* to your variable length values, not the values 
themselves.

> iii) "compression" level 5 seems best choice taking into account
> additionally performance (4% overhead compared to 10% using level 9).
> But regard i) reading this - maybe only a random ...
> 
> iv) my strings are much shorter than yours. With mine I observe that it
> is best to write ~350 of mine in a block with a chunksize of 16K. The
> number of strings writing in a block makes the biggest difference: 1 =>
> 425s, 10 => 55s, 100 => 21s, 350 => 20s, 600 => 22s, 1000 => 23s.
> 
> v) with always writing 100 strings in a block, the chunksize makes a
> difference of max. 10% (tested with 128 bytes up to 64K). But with
> chunksize of 128K perfromance degraded by factor 10 to 682s for a single
> run.

Don't know about this one, but it is certainly strange this dramatic loss in 
performance when passing from 64 KB to 128 KB chunksize.  It would be nice if 
you can build a small benchmark showing this problem in performance and send 
it to the HDF group for further analysis.

> Next I will try to use an array type of fixed length to see some working
> compression.

IMO, this is your best bet if you are after compressing your data.  BTW, when 
sending strings to HDF5 containers be sure to zero the memory buffer area 
after the end of the string: this could improve compression ratio quite a lot.

Hope this helps,

-- 
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to