Hi Quincy,
On Fri, Oct 15, 2010 at 06:08:19PM -0500, Quincey Koziol wrote:
> > If I leave out the creation of the datasets (i.e. just create
> > 100.000 groups) the size of the file drops to about 80 MB,
> > so creating a single group seems to "cost" about 800 byte.
>
> About what I'd expect.
>
> > Creating just 100.000 datasets (without groups) seems to be
> > less expensive, here the overhead seems to be in the order
> > of 350 bytes per dataset. Does that seems reasonable to you?
>
> That sounds approximately correct also.
>
> Adding those two numbers together gives me ~115MB. Plus 100,000 * 5 * 8
> bytes (for the raw data) brings things up to ~120MB. So there's
> approximately 24MB "missing" from the equation somewhere. (dark metadata!
> :-)
> Pointing h5stat with the "-f -F -g -G -d -D -T -A -s -S" options at the
> file produced gives only 16488 bytes of unaccounted for space, so not very
> much space has been wasted due to internal free space fragmentation. There's
> 27,200,000 bytes of space used for dataset object headers, right around the
> 300 bytes per dataset you mention, so that's OK. There's 95,291,840 bytes of
> B-tree information and 13,441,824 bytes of heap information for groups
> (~1087 bytes per group), which is above the 800 bytes per group that you
> mention and accounts for the missing space in the file.
Thanks, I see. I hadn't expected that creating a group or data
set would require that much space in the file. In the future
I will try to avoid using excessive amounts of them;-)
> Changing your HDF5Writer constructor to be this:
>
> HDF5Writer( H5std_string const & fileName )
> {
> hid_t fapl = H5Pcreate(H5P_FILE_ACCESS);
> H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
> FileAccPropList FileAccessPList(fapl);
> m_file = new H5File( fileName, H5F_ACC_TRUNC,
> FileCreatPropList::DEFAULT,
> m_group = new Group( m_file->openGroup( "/" ) );
> }
>
> (which enables the "latest/latest" option to H5Pset_libver_bounds) give
> a
> file that is only 50MB with 41543 bytes of unaccounted space, and only has
> and ~177 bytes of metadata information per group (although a bit more for
> the dataset objects at ~284 each, curiously). That's probably a good option
> for you here, and you could tweak it down further, if you wanted, with the
> H5Pset_link_phase_change and H5Pset_est_link_info calls. The one drawback of
> using this option is that the files created will only be able to be read by
> the 1.8.x releases of the library.
Thank you for the tips! I guess it's not problem when my program
only supports files written with 1.8.x, so I probably will use
just that.
Thank you very much and best regards, Jens
--
\ Jens Thoms Toerring ________ [email protected]
\_______________________________ http://toerring.de
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org