Hi Quincy,

On Fri, Oct 15, 2010 at 06:08:19PM -0500, Quincey Koziol wrote:
> > If I leave out the creation of the datasets (i.e. just create
> > 100.000 groups) the size of the file drops to about 80 MB,
> > so creating a single group seems to "cost" about 800 byte.
> 
>       About what I'd expect.
> 
> > Creating just 100.000 datasets (without groups) seems to be
> > less expensive, here the overhead seems to be in the order
> > of 350 bytes per dataset. Does that seems reasonable to you?
> 
>       That sounds approximately correct also.
>
>       Adding those two numbers together gives me ~115MB. Plus 100,000 * 5 * 8
> bytes (for the raw data) brings things up to ~120MB. So there's
> approximately 24MB "missing" from the equation somewhere. (dark metadata!
> :-)

>       Pointing h5stat with the "-f -F -g -G -d -D -T -A -s -S" options at the
> file produced gives only 16488 bytes of unaccounted for space, so not very
> much space has been wasted due to internal free space fragmentation. There's
> 27,200,000 bytes of space used for dataset object headers, right around the
> 300 bytes per dataset you mention, so that's OK. There's 95,291,840 bytes of
> B-tree information and 13,441,824 bytes of heap information for groups
> (~1087 bytes per group), which is above the 800 bytes per group that you
> mention and accounts for the missing space in the file.

Thanks, I see. I hadn't expected that creating a group or data
set would require that much space in the file. In  the future
I will try to avoid using excessive amounts of them;-)

>       Changing your HDF5Writer constructor to be this:
> 
>    HDF5Writer( H5std_string const & fileName )
>    {
> hid_t fapl = H5Pcreate(H5P_FILE_ACCESS);
> H5Pset_libver_bounds(fapl, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
> FileAccPropList FileAccessPList(fapl);
>        m_file = new H5File( fileName, H5F_ACC_TRUNC, 
> FileCreatPropList::DEFAULT,
>        m_group = new Group( m_file->openGroup( "/" ) );
>    }
> 
>       (which enables the "latest/latest" option to H5Pset_libver_bounds) give 
> a
> file that is only 50MB with 41543 bytes of unaccounted space, and only has
> and ~177 bytes of metadata information per group (although a bit more for
> the dataset objects at ~284 each, curiously). That's probably a good option
> for you here, and you could tweak it down further, if you wanted, with the
> H5Pset_link_phase_change and H5Pset_est_link_info calls. The one drawback of
> using this option is that the files created will only be able to be read by
> the 1.8.x releases of the library.

Thank you for the tips! I guess it's not problem when my program
only supports files written with 1.8.x, so I probably will use
just that.
                   Thank you very much and best regards, Jens
-- 
  \   Jens Thoms Toerring  ________      [email protected]
   \_______________________________      http://toerring.de

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to