A Saturday 27 March 2010 17:39:06 Paul Zumbo escrigué: > Good question. > > I had a vision of a structure whereas, each chromosome in the human > genome is a group, and each chromsome group is divided into bases, > which are also groups; those base groups would be filed with datasets > from multiple experiments at single base resolution... > > in fact, if possible, I would like to have ~3.5 billion groups! > > perhaps a structure like this isn't the best way to approach what I > want, but...
Definitely, having 3.5 billion groups I don't think this can be considered the best approach, at least with HDF5. Even if you use (as Elena suggest) the latest file format, you still need around 1 KB/group, so 3.5 billion groups will take 3.5 TB (and perhaps way more for keeping B-tree overhead), and this for keeping just the *structure*. I'd suggest to put more data on each dataset so that you can reduce the number of groups to a minimum. With this, you will probably still have the B-tree overhead, but with fine-tuned chunksizes for your datasets, this can be reduced to a bare minimum. For an example on the kind of enhancement that you can achieve, see: http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune Hope this helps, -- Francesc Alted _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
