Hi, Try to use H5Pset_libver_bounds function (see https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds) using H5F_LIBVER_LATEST for the second and third arguments to set up a file access property list and then use the access property list when opening existing file or creating a new one.
here is a C code snippet: fapl_id = H5Pcreate (H5P_FILE_ACCESS); H5Pset_libver_bounds (fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); file_id = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl_d); By default, the HDF5 library uses the earliest version of the file format when creating groups. The indexing structure used for that version has a know deficiency when working with a big number (>50K) of objects in a group. The issue was addressed in HDF5 1.8, but requires an applications to “turn on” the latest file format. Implications of the latest file format on the performance are not well documented. The HDF Group is aware of the issue and will be addressing it for the upcoming releases. Elena ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elena Pourmal The HDF Group http://hdfgroup.org 1800 So. Oak St., Suite 203, Champaign IL 61820 217.531.6112 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ On Nov 25, 2015, at 7:46 AM, levent_erb...@keysight.com<mailto:levent_erb...@keysight.com> wrote: Hello all, The HDF5 faq (https://www.hdfgroup.org/HDF5/faq/limits.html) refer to an example that create 100’000 groups in the ‘How many links can be in a group?’ section. My problem is that I need to create at least 1’000’000 groups in a single file, and the creation time decrease a lot after about 900’000. The application is written in C++ with hdf 1.8.5, running on Windows 7-64 16Gb ram. For a faster investigation, I wrote a very single python example and I can reproduce this issue on iMac 64bit, 32Gb ram, OSX 10.11. The average time is between 6-7 seconds to create 100’000 groups, and became about 6 minutes after 900’000 groups are created!!! I suppose that I need to configure something in HDF5 to avoid this kind of issue, i.e. set a greater cache size, or anything else… I’ll really appreciate if someone know the reason of this behavior! Here is the python example with the produced output. Best regards, Levent import h5py as h5 from datetime import datetime print(h5.version.info) hf = h5.File("f.h5", "w") print(str(datetime.now())) # start timestamp for i in range(1, 1000000): hf.create_group("/Acquisition."+str(i)) # create a group if not i % 100000: print(str(datetime.now()) + ' : ' + str(i)) # time stamp on each 100’000 groups created print(str(datetime.now())) # end timestamp Summary of the h5py configuration --------------------------------- h5py 2.5.0 HDF5 1.8.13 Python 3.5.0 (default, Sep 14 2015, 02:37:27) [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] sys.platform darwin sys.maxsize 9223372036854775807 numpy 1.10.1 2015-11-25 10:16:48.109794 2015-11-25 10:16:54.340278 : 100000 2015-11-25 10:17:00.661270 : 200000 2015-11-25 10:17:07.006722 : 300000 2015-11-25 10:17:13.435274 : 400000 2015-11-25 10:17:19.829139 : 500000 2015-11-25 10:17:27.221807 : 600000 2015-11-25 10:17:33.599402 : 700000 2015-11-25 10:17:39.979077 : 800000 2015-11-25 10:17:46.284342 : 900000 2015-11-25 10:23:36.377318 _______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org<mailto:Hdf-forum@lists.hdfgroup.org> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
_______________________________________________ Hdf-forum is for HDF software users discussion. Hdf-forum@lists.hdfgroup.org http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5