Thanks for follow up. When I mentioned “orders of magnitude” I don’t have precise numbers; for that I would have to write a program that does exactly what the program is doing without HDF5: that is, read up to 20GB of input binary data (one file) and distribute it by several datasets (or several files, without HDF5), appending at some point a “slab” to them. So, the “orders of magnitude” is just my guess.
>So, I guess what I am saying is that I've got to believe it is possible >to change the way NeXus uses HDF5 and/or the way HDF5 interacts with >underlying storage (maybe by writing your own VFD), In this case the systems are Linux x86_64. The way NeXus is designed I think the behavior explained in the H5Map API is not possible. NeXus is designed so that one does not have to deal with “IDs” like HDF5, where we assemble all of them together, which allows great flexibility. In the case of NeXus, it is more “high-level” than HDF5. You have operations like 1. nxsfile.openData(dataset_name); 2. nxsfile.putSlab( data, start, size ); 3. nxsfile.closeData(); which maintain internally a “current” HDF5 dataset ID (there is only one “current”); in this case these operations open and close the dataset (in HDF5 terms), so I wanted to avoid precisely this. In the H5Map API it is possible to keep open all these datasets that I want to append to. They are closed only at end and not continuously re-opened and re-closed a multitude of times. I can obtain the HDF5 dataset ID that I want to append to by looking the STL map by path. Pedro ----- Original Message ----- From: Mark C. Miller To: HDF Users Discussion List Sent: Tuesday, July 26, 2011 12:34 AM Subject: Re: [Hdf-forum] H5 Map - a HDF5 chunk caching API On Mon, 2011-07-25 at 17:49 -0700, Pedro Silva Vicente wrote: > However, due to the way the NeXus API was used, performance was very > slow. It was several orders of magnitude slower > than using a plain binary file to save the experiment results, so one > question that come up was > > “Why should I use HDF5 instead of a binary file, if it’s several > orders of magnitude slower?” > > So, I implemented the 2 solutions explained. Ok, thanks for context. That helps. If I recall though, the best of your two solutions resulted in only about a 65% speedup. Thats maybe a 2.2x speedup which is certainly a step in the right direction but hardly solves the 'orders of magnitude' problem you mention. Now, I've been using and comparing raw binary I/O and products like HDF5 for many years. In all my experiences and depending on the size of the read/write requests, I have observed and have been willing to tolerate at most a 2-3x performance hit for using a 'high level' I/O library over what is achievable using raw binary I/O interfaces. And, honestly, the performance hit is usually less than 20-25% of raw binary bandwidth. That, that assumes the HDF5 library is being used properly and, in turn, the HDF5 library is using the underlying filesystem properly. I have seen situations where neither or both are not the case and indeed I have also seen orders of magnitude loss of performance. In fact, within the last year, we needed to write our a specialized Virtual File Driver (VFD) to get good performance on our BG/P system. Writing a new VFD for HDF5 wasn't necessarily simple. But, it was possible and doing so resulted in 30-50x performance improvement. On top of that, getting the applications to use HDF5 slightly differently I think can gives us another 2-3x performance improvement. So, I guess what I am saying is that I've got to believe it is possible to change the way NeXus uses HDF5 and/or the way HDF5 interacts with underlying storage (maybe by writing your own VFD), to address the 'orders of magnitude' performance hit. Otherwise, I agree with you, why should anyone pay that kind of a price to use it? Mark _______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
