Hi Mark,
On Jun 23, 2010, at 5:54 PM, Mark Miller wrote:
> Hi All,
>
> I have to admit I've been totally baffled by meaning of the 1/2 rank and
> 1/2 node size parameters controlling B-Tree storage for groups and
> chunked datasets. What do these parameters mean in terms of the
> arrangement of groups in an HDF5 file and number of items, on average in
> a group and/or average depth of the hierarchy of groups? I've even
> googled these terms and don't find useful information.
Another place we should improve our documenation... *sigh* B-tree
nodes are allowed to have between the "1/2 rank" and twice that value (the
"full rank", maybe? The wikipedia page for B-trees
(http://en.wikipedia.org/wiki/B-tree) calls this the "order" of the B-tree)
number of entries in them (except for the root of the B-tree, which can have
less).
> If my files typically have between 2 and 5 groups deep with between 5
> and 100 objects in a group, what should I set these parameters to?
These parameters won't affect the depth, but if you have small numbers
of links in a group (or chunks in a dataset), you can reduce the 1/2 rank value
to be ~1/2 of the number of links in the largest group. Basically these
parameters affect the maximum "fan out" from each node in the B-tree and if you
have a small number of entries in your B-trees, your file size should be
smaller and your performance may improve by reducing the parameter values.
However, if you reduce the parameter values too far, the depth of the B-tree
will increase and things will get worse again. I don't normally suggest
tweaking these values unless you have unusually weird file layouts like 10,000
links in a group or all the groups having only one link in them, etc.
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org