Hi,

Try to use H5Pset_libver_bounds function (see 
https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds) 
using H5F_LIBVER_LATEST for the second and third arguments to set up a file 
access property list and then use the access property list when opening 
existing file or creating a new one.

here is a C code snippet:

fapl_id = H5Pcreate (H5P_FILE_ACCESS);
H5Pset_libver_bounds (fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST);
file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_d);

By default, the HDF5 library uses the earliest version of the file format when 
creating groups. The indexing structure used for that version has a know 
deficiency when working with a big number (>50K) of objects in a group. The 
issue was addressed in HDF5 1.8, but requires an applications to “turn on” the 
latest file format.

Implications of the latest file format on the performance are not well 
documented. The HDF Group is aware of the issue and will be addressing it for 
the upcoming releases.

Elena
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Elena Pourmal  The HDF Group  http://hdfgroup.org
1800 So. Oak St., Suite 203, Champaign IL 61820
217.531.6112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




On Nov 25, 2015, at 7:46 AM, 
levent_erb...@keysight.com<mailto:levent_erb...@keysight.com> wrote:

Hello all,

The HDF5 faq (https://www.hdfgroup.org/HDF5/faq/limits.html) refer to an 
example that create 100’000 groups in the ‘How many links can be in a group?’ 
section.

My problem is that I need to create at least 1’000’000 groups in a single file, 
and the creation time decrease a lot after about 900’000.
The application is written in C++ with hdf 1.8.5, running on Windows 7-64 16Gb 
ram.

For a faster investigation, I wrote a very single python example and I can 
reproduce this issue on iMac 64bit, 32Gb ram, OSX 10.11.
The average time is  between 6-7 seconds to create 100’000 groups, and became 
about 6 minutes after 900’000 groups are created!!!

I suppose that I need to configure something in HDF5 to avoid this kind of 
issue, i.e. set a greater cache size, or anything else…
I’ll really appreciate if someone know the reason of this behavior!
Here is the python example with the produced output.
Best regards,
Levent

import h5py as h5
from datetime import datetime

print(h5.version.info)
hf = h5.File("f.h5", "w")
print(str(datetime.now())) # start timestamp

for i in range(1, 1000000):
    hf.create_group("/Acquisition."+str(i)) # create a group
    if not i % 100000:
        print(str(datetime.now()) + ' : ' + str(i)) # time stamp on each 
100’000 groups created

print(str(datetime.now())) # end timestamp

Summary of the h5py configuration
---------------------------------
h5py    2.5.0
HDF5    1.8.13
Python  3.5.0 (default, Sep 14 2015, 02:37:27) [GCC 4.2.1 Compatible Apple LLVM 
6.1.0 (clang-602.0.53)]
sys.platform    darwin
sys.maxsize     9223372036854775807
numpy   1.10.1

2015-11-25 10:16:48.109794
2015-11-25 10:16:54.340278 : 100000
2015-11-25 10:17:00.661270 : 200000
2015-11-25 10:17:07.006722 : 300000
2015-11-25 10:17:13.435274 : 400000
2015-11-25 10:17:19.829139 : 500000
2015-11-25 10:17:27.221807 : 600000
2015-11-25 10:17:33.599402 : 700000
2015-11-25 10:17:39.979077 : 800000
2015-11-25 10:17:46.284342 : 900000
2015-11-25 10:23:36.377318

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org>
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

_______________________________________________
Hdf-forum is for HDF software users discussion.
Hdf-forum@lists.hdfgroup.org
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Reply via email to