Hi Mark,

On Jun 23, 2010, at 5:54 PM, Mark Miller wrote:

> Hi All,
> 
> I have to admit I've been totally baffled by meaning of the 1/2 rank and
> 1/2 node size parameters controlling B-Tree storage for groups and
> chunked datasets. What do these parameters mean in terms of the
> arrangement of groups in an HDF5 file and number of items, on average in
> a group and/or average depth of the hierarchy of groups? I've even
> googled these terms and don't find useful information.

        Another place we should improve our documenation... *sigh*  B-tree 
nodes are allowed to have between the "1/2 rank" and twice that value (the 
"full rank", maybe?  The wikipedia page for B-trees 
(http://en.wikipedia.org/wiki/B-tree) calls this the "order" of the B-tree) 
number of entries in them (except for the root of the B-tree, which can have 
less).

> If my files typically have between 2 and 5 groups deep with between 5
> and 100 objects in a group, what should I set these parameters to?

        These parameters won't affect the depth, but if you have small numbers 
of links in a group (or chunks in a dataset), you can reduce the 1/2 rank value 
to be ~1/2 of the number of links in the largest group.  Basically these 
parameters affect the maximum "fan out" from each node in the B-tree and if you 
have a small number of entries in your B-trees, your file size should be 
smaller and your performance may improve by reducing the parameter values.  
However, if you reduce the parameter values too far, the depth of the B-tree 
will increase and things will get worse again.  I don't normally suggest 
tweaking these values unless you have unusually weird file layouts like 10,000 
links in a group or all the groups having only one link in them, etc.

        Quincey


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to