Hi Elena, I just tried it on a local system with the XFS file system. The same issue happens for H5F_LIBVER_EARLIEST, but for both H5F_LIBVER_18 and H5F_LIBVER_LATEST the bandwidth becomes stable (although still lower than the case with NGROUP=128 by a factor of 1.5 ~ 2). Please let me know if you could reproduce these results. Thanks!
Justin 2016-02-21 17:54 GMT-06:00 Elena Pourmal <[email protected]>: > Hi Justin, > > Thanks a lot for the program! We will take a look. > > Just one more question. Have you tried to run your benchmark on some other > file system? > > Thanks again! > > Elena > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > Elena Pourmal The HDF Group http://hdfgroup.org > 1800 So. Oak St., Suite 203, Champaign IL 61820 > 217.531.6112 > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > > On Feb 21, 2016, at 5:05 PM, Hsi-Yu Schive <[email protected]> wrote: > > Hi Elena, > > A simple code demonstrating this issue is attached. Please try to modify > the variables "NGroup, LibVerLow, LibVerLow". NGroup gives the number of > groups for a fixed number of datasets (NDataset), and the other two > variables specify the file format. The size of each dataset is ~2 KB. > > I tried four different cases, with the combination of NGroup=1 or 128 and > LibVerLow=H5F_LIBVER_EARLIEST or H5F_LIBVER_18. For NGroup=1, the I/O > bandwidth drops dramatically when the file size exceeds ~ 3.4 GB. For > NGroup=128, the bandwidth becomes reasonable. The results are similar for > different LibVerLow (actually the results are a bit worse for H5F_LIBVER_18 > and H5F_LIBVER_LATEST than for H5F_LIBVER_EARLIEST ). > > Some system spec: > HDF5 version: 1.8.16 > CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz > File system: gpfs > OS: CentOS release 6.7 > > Sincerely, > Justin > > 2016-02-19 17:41 GMT-06:00 Elena Pourmal <[email protected]>: > >> Justin, >> >> Will it be possible for you to provide a program that illustrates the >> problem? Which version of the library are you using? On which system are >> you running your application? >> >> Thank you! >> >> Elena >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> Elena Pourmal The HDF Group http://hdfgroup.org >> 1800 So. Oak St., Suite 203, Champaign IL 61820 >> 217.531.6112 >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> >> >> >> >> On Feb 19, 2016, at 4:03 PM, Hsi-Yu Schive <[email protected]> wrote: >> >> Thanks for the suggestion. The performance I reported was measured using >> the earliest file format (i.e., H5F_LIBVER_EARLIEST). I just tried to use >> H5F_LIBVER_18, but it leads to an even worse performance. The bandwidth >> starts to drop when N > ~ 0.5 million. Using H5F_LIBVER_LATEST does not >> help either. >> >> Justin >> >> 2016-02-19 8:26 GMT-06:00 Gerd Heber <[email protected]>: >> >>> Are you using the latest version of the file format? In other words, are >>> you using H5P_DEFAULT (-> earliest) >>> >>> as your file access property list, or have you created one which sets >>> the library version bounds to H5F_LIBVER_18? >>> >>> >>> >>> See >>> https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds >>> >>> >>> >>> In the newer version, groups with large numbers of links and attributes >>> are managed more. >>> >>> >>> >>> Does that solve your problem? >>> >>> >>> >>> Best, G. >>> >>> >>> >>> >>> >>> *From:* Hdf-forum [mailto:[email protected]] *On >>> Behalf Of *Hsi-Yu Schive >>> *Sent:* Thursday, February 18, 2016 2:36 PM >>> *To:* [email protected] >>> *Subject:* [Hdf-forum] I/O bandwidth drops dramatically and >>> discontinuously for a large number of small datasets >>> >>> >>> >>> I encounter a sudden drop of I/O bandwidth when the number of datasets >>> in a single group exceeds around 1.7 million. In the following I describe >>> the issue in more detail. >>> >>> >>> >>> I'm converting an adaptive mesh refinement data to HDF5 format. Each >>> dataset contains a small 4-D array with a size of ~ 10 KB in the compact >>> format. All datasets are stored in the same group. When the total number of >>> datasets (N) is smaller than ~ 1.7 million, I get an I/O bandwidth of ~100 >>> MB/s, which is acceptable. However, when N exceeds ~ 1.7 million, the >>> bandwidth suddenly drops by at least one to two orders of magnitude. >>> >>> >>> >>> This issue seems to relate to the **number of datasets per group** >>> instead of total data size. For example, if I reduce the size of each >>> dataset by a factor of 5 (so ~2 KB per dataset), the I/O bandwidth stills >>> drops when N > ~ 1.7 million, even though the total data size is reduced by >>> a factor of 5. >>> >>> >>> >>> So I was wondering what causes this issue, and if there is any simple >>> solution to that. Since the data stored in different datasets are >>> independent to each other, I prefer not to combine them into a larger >>> dataset. My current solution is to further create several HDF5 sub-groups >>> under the main group, and then distribute all datasets evenly in these >>> sub-groups (so that the number of datasets per group becomes smaller). By >>> doing so the I/O bandwidth becomes stable even when N > 1.7 million. >>> >>> >>> >>> If necessary, I can post a simplified code to reproduce this issue. >>> >>> >>> >>> Hsi-Yu >>> >>> _______________________________________________ >>> Hdf-forum is for HDF software users discussion. >>> [email protected] >>> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >>> Twitter: https://twitter.com/hdf5 >>> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 >> >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org >> Twitter: https://twitter.com/hdf5 >> > > <HDF5_IO_Bandwidth__Justin.cpp> > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org > Twitter: https://twitter.com/hdf5 >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org Twitter: https://twitter.com/hdf5
