Thanks for follow up.

When I mentioned “orders of magnitude” I don’t have precise numbers; for that I 
would have to write a program that does exactly what the program is doing 
without HDF5: that is, read up to 20GB of input binary data (one file) and 
distribute it by several datasets (or several files, without HDF5), appending 
at some point a “slab” to them. So, the “orders of magnitude” is just my guess.

>So, I guess what I am saying is that I've got to believe it is possible
>to change the way NeXus uses HDF5 and/or the way HDF5 interacts with
>underlying storage (maybe by writing your own VFD), 

In this case the systems are Linux x86_64. The way NeXus is designed I think 
the behavior explained in the H5Map API is not possible. NeXus is designed so 
that one does not have to deal with “IDs” like HDF5, where we assemble all of 
them together, which allows great flexibility. 
In the case of NeXus, it is more “high-level” than HDF5. You have operations 
like

1. nxsfile.openData(dataset_name);
2. nxsfile.putSlab( data, start, size );
3. nxsfile.closeData();

which maintain internally a “current” HDF5 dataset ID (there is only one 
“current”); in this case these operations open and close the dataset (in HDF5 
terms), so I wanted to avoid precisely this.  In the H5Map API it is possible 
to keep open all these datasets that I want to append to. They are closed only 
at end and not continuously re-opened and re-closed a multitude of times. I can 
obtain the HDF5 dataset ID that I want to append to by looking the STL map by 
path.

Pedro


  ----- Original Message ----- 
  From: Mark C. Miller 
  To: HDF Users Discussion List 
  Sent: Tuesday, July 26, 2011 12:34 AM
  Subject: Re: [Hdf-forum] H5 Map - a HDF5 chunk caching API



  On Mon, 2011-07-25 at 17:49 -0700, Pedro Silva Vicente wrote:


  > However, due to the way the NeXus API was used, performance was very
  > slow. It was several orders of magnitude slower 
  > than using a plain binary file to save the experiment results, so one
  > question that come up was
  >  
  > “Why should I use HDF5 instead of a binary file, if it’s several
  > orders of magnitude slower?”
  >  
  > So, I implemented the 2 solutions explained.

  Ok, thanks for context. That helps. If I recall though, the best of your
  two solutions resulted in only about a 65% speedup. Thats maybe a 2.2x
  speedup which is certainly a step in the right direction but hardly
  solves the 'orders of magnitude' problem you mention.

  Now, I've been using and comparing raw binary I/O and products like HDF5
  for many years. In all my experiences and depending on the size of the
  read/write requests, I have observed and have been willing to tolerate
  at most a 2-3x performance hit for using a 'high level' I/O library over
  what is achievable using raw binary I/O interfaces. And, honestly, the
  performance hit is usually less than 20-25% of raw binary bandwidth.
  That, that assumes the HDF5 library is being used properly and, in turn,
  the HDF5 library is using the underlying filesystem properly. I have
  seen situations where neither or both are not the case and indeed I have
  also seen orders of magnitude loss of performance. In fact, within the
  last year, we needed to write our a specialized Virtual File Driver
  (VFD) to get good performance on our BG/P system. Writing a new VFD for
  HDF5 wasn't necessarily simple. But, it was possible and doing so
  resulted in 30-50x performance improvement. On top of that, getting the
  applications to use HDF5 slightly differently I think can gives us
  another 2-3x performance improvement.

  So, I guess what I am saying is that I've got to believe it is possible
  to change the way NeXus uses HDF5 and/or the way HDF5 interacts with
  underlying storage (maybe by writing your own VFD), to address the
  'orders of magnitude' performance hit. Otherwise, I agree with you, why
  should anyone pay that kind of a price to use it?

  Mark




  _______________________________________________
  Hdf-forum is for HDF software users discussion.
  [email protected]
  http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to