Re: [Hdf-forum] H5Gcreate

Francesc Alted Mon, 29 Mar 2010 00:54:36 -0700

A Saturday 27 March 2010 17:39:06 Paul Zumbo escrigué:
> Good question.
> 
> I had a vision of a structure whereas, each chromosome in the human
> genome is a group, and each chromsome group is divided into bases,
> which are also groups; those base groups would be filed with datasets
> from multiple experiments at single base resolution...
> 
> in fact, if possible, I would like to have ~3.5 billion groups!
> 
> perhaps a structure like this isn't the best way to approach what I
> want, but...


Definitely, having 3.5 billion groups I don't think this can be considered the 
best approach, at least with HDF5.  Even if you use (as Elena suggest) the 
latest file format, you still need around 1 KB/group, so 3.5 billion groups 
will take 3.5 TB (and perhaps way more for keeping B-tree overhead), and this 
for keeping just the *structure*.

I'd suggest to put more data on each dataset so that you can reduce the number 
of groups to a minimum.  With this, you will probably still have the B-tree 
overhead, but with fine-tuned chunksizes for your datasets, this can be 
reduced to a bare minimum.  For an example on the kind of enhancement that you 
can achieve, see:

http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune

Hope this helps,

-- 
Francesc Alted

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] H5Gcreate

Reply via email to