Hi Ken,
On May 22, 2012, at 2:55 PM, Leiter, Kenneth Mr CIV USA USAMC wrote:
> UNCLASSIFIED
> Hello,
>
> I have an application that reads many datasets located in many hdf5 files.
> With my current implementation I do not know whether I will read from a file
> more than once, so I open and close the file each time I read a dataset.
>
> I profiled my code that reads many thousands of datasets in a few hundred
> files and see that creation and destruction of the metadata cache takes up a
> significant portion of runtime:
>
> H5AC_create = 10.3%
> H5AC_dest = 35%
>
> The H5AC_dest spends the entire time in H5C_flush_invalidate_cache.
>
> To compare, H5Dopen and H5Dread take 9.2% of run time combined.
>
> I know the dataset location that I want to read in the hdf5 file ahead of
> time, and I don't require (to my knowledge) any metadata in order to read the
> dataset. I thought that perhaps disabling metadata creation would help, but I
> don't see a way of doing it (I am using 1.8.5-patch1). I attempted to set min
> and max metadata cache size to 1024 bytes (the minimum allowed) but saw no
> improvement in performance. Does anyone know an alternative way of getting
> around this problem other than avoiding repeated file open and closes.
You will need to read metadata in order to read your dataset - the
dataset's object header needs to be looked up from the dataset's name, etc. I
doubt that reducing the size of the metadata cache will help and all metadata
access is performed through it, so it can't really be disabled in a meaningful
way without significant changes to the library. Can you push the profiling
into the H5AC_create and H5AC_dest (H5C_flush_invalidate_cache) calls a big
further and see if there are some algorithmic issues that are slowing things
down for you?
Quincey
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org