Hi Manoj,

On Apr 26, 2010, at 1:00 PM, Manoj Rajagopalan wrote:

> Hi HDF5-users,
> 
>   I am new to HDF5 but an experienced C++ programmer. Having worked with many 
> mature open-source libraries I note a few things about the HDF5 C++ API - 
> please correct me where I am wrong.
> 
>    I am aware the workarounds exist for all the issues I raise but I am 
> simply trying to bring out from experience, areas where I believe the current 
> HDF5 C++ API clashes with expectations and certain ideal(ized) design 
> philosophy (IMHO).
> 
>    But before I start, please let me express my great appreciation for HDF5 
> as a scalable, cross-platform, open-source standard for large-volume 
> computational data storage and transfer, and my gratitude for making it 
> available as a free download.

        Thank you for spending the time to write such a detailed and valuable 
critique of the issues you see with the C++ wrappers, we really appreciate it!

        I've included comments below to address individual points, but I'd also 
like to introduce a new topic for discussion:  how valuable are the current C++ 
wrappers to experienced C++ developers?  I don't think they add much value, 
because the underlying C layer is reasonably object-oriented and is callable 
directly from C++.  Would the user community be OK with deprecating them and 
opening the floor to a newer, community driven (and probably developed) set of 
C++ bindings?

        Quincey

> ISSUE 1: Excessive/Inappropriate use of TRY-CATCH
> -----------------------------------------------
>   We are forced to use try-catch blocks like if-else blocks - there is a 
> conspicious absence of query functions for checking to see if a group or 
> dataset exists - instead, we have to call the openGroup or openDataSet 
> functions and trap the exception if that fails.
> 
>   There are a few issues that this creates:
> 
> 1. It forces an alternate programming approach on otherwise conventional, 
> more 
> meaningful and readable coding - exceptions now are not just for exceptions 
> (eg. does the absence of a dataset in an existence query really comprise an 
> exception or just an expected failure case when the producer and consumer of 
> the data just happen to be different? )
> 
> 2. It disallows the use of compiler flags like -fno-exceptions in g++ because 
> the library is dependent on exceptions to guarantee correctness. Exceptions  
> cause the compiler to include a heavier runtime into the linked executable 
> and can have performance implications even if the actual code doesn't use 
> these features (what the compiler can infer about our code from it's static 
> analysis is limited). Therefore, by including HDF5 to store partial 
> calculation results in my nested loops, I am forced to switch 
> from -fno-exceptions to -fexceptions and risk introducing "excess baggage" 
> which I could shed previously. This has a cascade effect on my whole code.
> 
>   In a nutshell, introducing HDF5 into my code has caused a minor 
> architecture change to my whole code.

        Yes, I think we went a bit overboard with exceptions in the current C++ 
wrappers.  :-)  Do you have a suggestion for changing them to avoid exceptions?

> ISSUE 2: ACCESSORS AND QUERIES FOR OBJECT (TYPES)
> ----------------------------------------------
> 
> 1.   The v1.6.x API allowed querying the type of the object. This allows 
> switch-case-break blocks to take actions on each sub-item of a group 
> depending on the case. For example, it is easy to write a (graphical) HDF5 
> file-browser with such API. IIUC, with v1.8.x, some functions like 
> CommonFG::getObjtypeByName() are deprecated. But, achieving the above example 
> use case will now involve a whole bunch of try-catch blocks, each trying to 
> open a different possible type. For example,
> 
>   try { Group subgroup = group.openGroup(name); /* do something*/ }
>   catch(Exception const& ex) {}
> 
>   try { DataSet ds = group.openDataSet(name); /* do something*/ }
>   catch(Exception const& ex) {}
> 
> Here, if openGroup succeeds, an openDataSet() attempt will still be performed 
> unless we use extra flags and if() conditions possibly with goto statements.
> 
>  An equivalent switch-case block is more readable and encloses a logical unit 
> of code that performs a well-defined function, namely, branching of control.

        Hmm, the new H5O* routines in the 1.8 release 
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5O.html) haven't been added to the C++ 
wrappers yet, but I think they should address your concerns here, particularly 
H5Oget_info_by_name 
(http://www.hdfgroup.org/HDF5/doc/RM/RM_H5O.html#Object-GetInfoByName).  You 
might also H5Lexists and H5Oexists_by_name (which is new for the 1.8.5 release 
and is not included in the online documentation yet).

> 2.   To address the above case, it might make sense to introduce different 
> iterators as in STL. For example, Group::group_iterator, 
> Group::dataset_iterator, DataSet::attribute_iterator (?)
> 
>   These iterators obviate the need to manually apply filters to identify each 
> child of a parent group. So if there is a need to identify just the datasets 
> at the current level, the Group::dataset_iterator would help.

        I think that's an interesting and useful idea.

> ISSUE 3: WRITE API FOR DATASETS
> -------------------------------------
> 
> 1.   Once a DataSet object is instantiated with a DataType and DataSpace, the 
> common-case write of the dataset would normally involve the same datatype 
> with which it was created. Why do we need to re-mention it during write()? 
> Understandably, this would help with conversions (I don't know much about 
> HDF5 conversions). If this is the case, ideally there should *also* be a 
> write() member function that takes just one parameter - the pointer to the 
> data buffer - because all other information including DataType is inferrable 
> from the dataset object. As a beginner I was perplexed till I caught 
> the "conversions" keyword.
> 
>   Same thing with read() - if the DataType in-file is not convertible to the 
> DataType of the DataSet object on which read() is being called then this 
> would comprise an exception.

        It might be nice to make this smoother, but an important aspect of HDF5 
is the datatype conversions available.

> 2.    Writing strings, currently, is a little involved. There could be 
> convenience functions named "writeString" or even just "write" that take one 
> string arg. A beginner is faced with questions about fixed-length vs 
> variable-length vs character-array (with or without the trailing '\0'?)
> 
> 3.   Similarly, writing single integers or floats could be supported using 
> functions named writeInt(), writeFloat(), writeUInt() etc. which would be 
> useful for attributes and would hide PredType::NATIVE_INT from a beginner. 
> Also, I imagine NATIVE_<TYPE> is commonly used so such convenience functions 
> would allow rapid development without a large learning before first use.

        Points taken, thanks! :-)

> 4.   Using the type-traits<> template-based techniques along with partial 
> specialization as in STL and BOOST libraries, it is possible to write short, 
> simple code that could permit one polymorphic function, say
> 
>   template<typename T>
>      void writeAtom(H5::Group & g, T const& t, string const& name);
> 
>  to write different common atomic types like float, int, string etc. To 
> illustrate this I am attaching .h and .cpp files where the functions 
> {write,read}_hdf5_scalar_attribute() are implemented in this way.

        Nifty, thanks again!

> ISSUE 4: STANDARD API for COMPLEX TYPES
> ---------------------------------------
> 
>  It is quite common to use complex<float> or complex<double> in mathematical 
> calculations so it would be nice to have predefined datatypes for these. 
> Since FORTRAN, C99 and C++ all support complex with up-to long double 
> precision at the language level, HDF5-support would make life so much easier.

        We are planning to extend the predefined HDF5 datatypes to support 
complex datatypes in the 1.10.0 release.  (Although, we haven't absolutely 
committed to this yet, since it's a fair bit of work)

> ISSUE 5: H5File API
> ------------------------
> 
> 1.    Is there a requirement for CommonFG to be a base-class at all? Can't 
> all 
> included operations be collapsed into just the Group class? To do this with a 
> file object, just retrieve the root group using file.openGroup("/") and then 
> work simply with groups. To annotate the H5File itself with meta-info, 
> provide a separate API. Class hierarchies should represent meaningful 
> relationships between parents and progeny. The root group in a file is not 
> the file itself and CommonFG is required only when we mix up the two 
> definitions (IIUC, IMHO).

        Hmm, I'm not certain why we implemented things this way, but you do 
have a good point.

> 2.    The H5File contructor supports some H5F_ACC_? parameters that 
> H5File::open() fails with. This is not documented in the DOXYGEN-generated 
> API. This is forcing me to include a whole bunch of code within a try-catch 
> block simply because the H5File object must now be created inside the block 
> instead of simply using the open() member function - and is therefore visible 
> only inside the try-catch block!
> 
>   IMHO, H5File should follow a model similar to ifstream and ofstream for the 
> open() and close() functions - while a constructor performs an open(), the 
> latter can also be performed separately with the same H5F_ACC_? flags.

        I think this is a bug, yes.

> Thanks,
> Manoj Rajagopalan
> PhD Candidate, EECS (CSE)
> University of Michigan, Ann Arbor
> <hdf5-utils.cpp><hdf5-utils.h>_______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org


_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to