What is HDF5, and why should you use it ?

http://www.hdfgroup.org/why_hdf/

(My summary):
- very large data sets, very fast access requirements, and complex datasets
  - share data across variety of platforms
  - many open-source and commercial tools that understand HDF
- self-describing and can specify complex data relationships and dependencies
  - can contain binary data in many representations
- allow direct access to parts of file without first parsing whole contents - hierarchical data objects can be expressed in natural manner (contrast
    experience with realational database tables)
- n-dimensional datasets and each element in set may be complex object - relational databases good for field matching queries but not for sequentially processing all records in database or for subsetting data
    based on co-ordinate style lookup
- custom proprietary binary formats often not portable, not extensible and not high-performance. technical debt to maintain data management part of
    code

I personally find it useful for storing price data for financial instruments, and also economic data. There are bespoke time series databases, but they come at a price, which is not purely a pecuniary one.

Updated wrappers are here:
https://github.com/Laeeth/d_hdf5

Changes since last time - some fixes to bindings and updates to later version of HDF5 API. There is more to go to make it accessible idiomatically from D, but it's usable today. A simple example of mapping D structs to HDF5 types and back again is in the examples/traits directory.

Pull requests and offers to help maintain it are welcome. It's still at an alpha stage, but already useful.



Laeeth.

Reply via email to