Hi HDF5-users,

   I am new to HDF5 but an experienced C++ programmer. Having worked with many 
mature open-source libraries I note a few things about the HDF5 C++ API - 
please correct me where I am wrong.

    I am aware the workarounds exist for all the issues I raise but I am 
simply trying to bring out from experience, areas where I believe the current 
HDF5 C++ API clashes with expectations and certain ideal(ized) design 
philosophy (IMHO).

    But before I start, please let me express my great appreciation for HDF5 
as a scalable, cross-platform, open-source standard for large-volume 
computational data storage and transfer, and my gratitude for making it 
available as a free download.


ISSUE 1: Excessive/Inappropriate use of TRY-CATCH
-----------------------------------------------
   We are forced to use try-catch blocks like if-else blocks - there is a 
conspicious absence of query functions for checking to see if a group or 
dataset exists - instead, we have to call the openGroup or openDataSet 
functions and trap the exception if that fails.

   There are a few issues that this creates:

1. It forces an alternate programming approach on otherwise conventional, more 
meaningful and readable coding - exceptions now are not just for exceptions 
(eg. does the absence of a dataset in an existence query really comprise an 
exception or just an expected failure case when the producer and consumer of 
the data just happen to be different? )

2. It disallows the use of compiler flags like -fno-exceptions in g++ because 
the library is dependent on exceptions to guarantee correctness. Exceptions  
cause the compiler to include a heavier runtime into the linked executable 
and can have performance implications even if the actual code doesn't use 
these features (what the compiler can infer about our code from it's static 
analysis is limited). Therefore, by including HDF5 to store partial 
calculation results in my nested loops, I am forced to switch 
from -fno-exceptions to -fexceptions and risk introducing "excess baggage" 
which I could shed previously. This has a cascade effect on my whole code.

   In a nutshell, introducing HDF5 into my code has caused a minor 
architecture change to my whole code.


ISSUE 2: ACCESSORS AND QUERIES FOR OBJECT (TYPES)
----------------------------------------------

1.   The v1.6.x API allowed querying the type of the object. This allows 
switch-case-break blocks to take actions on each sub-item of a group 
depending on the case. For example, it is easy to write a (graphical) HDF5 
file-browser with such API. IIUC, with v1.8.x, some functions like 
CommonFG::getObjtypeByName() are deprecated. But, achieving the above example 
use case will now involve a whole bunch of try-catch blocks, each trying to 
open a different possible type. For example,

   try { Group subgroup = group.openGroup(name); /* do something*/ }
   catch(Exception const& ex) {}

   try { DataSet ds = group.openDataSet(name); /* do something*/ }
   catch(Exception const& ex) {}

Here, if openGroup succeeds, an openDataSet() attempt will still be performed 
unless we use extra flags and if() conditions possibly with goto statements.

  An equivalent switch-case block is more readable and encloses a logical unit 
of code that performs a well-defined function, namely, branching of control.


2.   To address the above case, it might make sense to introduce different 
iterators as in STL. For example, Group::group_iterator, 
Group::dataset_iterator, DataSet::attribute_iterator (?)

   These iterators obviate the need to manually apply filters to identify each 
child of a parent group. So if there is a need to identify just the datasets 
at the current level, the Group::dataset_iterator would help.


ISSUE 3: WRITE API FOR DATASETS
-------------------------------------

1.   Once a DataSet object is instantiated with a DataType and DataSpace, the 
common-case write of the dataset would normally involve the same datatype 
with which it was created. Why do we need to re-mention it during write()? 
Understandably, this would help with conversions (I don't know much about 
HDF5 conversions). If this is the case, ideally there should *also* be a 
write() member function that takes just one parameter - the pointer to the 
data buffer - because all other information including DataType is inferrable 
from the dataset object. As a beginner I was perplexed till I caught 
the "conversions" keyword.
  
   Same thing with read() - if the DataType in-file is not convertible to the 
DataType of the DataSet object on which read() is being called then this 
would comprise an exception.

2.    Writing strings, currently, is a little involved. There could be 
convenience functions named "writeString" or even just "write" that take one 
string arg. A beginner is faced with questions about fixed-length vs 
variable-length vs character-array (with or without the trailing '\0'?)

3.   Similarly, writing single integers or floats could be supported using 
functions named writeInt(), writeFloat(), writeUInt() etc. which would be 
useful for attributes and would hide PredType::NATIVE_INT from a beginner. 
Also, I imagine NATIVE_<TYPE> is commonly used so such convenience functions 
would allow rapid development without a large learning before first use.

4.   Using the type-traits<> template-based techniques along with partial 
specialization as in STL and BOOST libraries, it is possible to write short, 
simple code that could permit one polymorphic function, say

   template<typename T>
      void writeAtom(H5::Group & g, T const& t, string const& name);

  to write different common atomic types like float, int, string etc. To 
illustrate this I am attaching .h and .cpp files where the functions 
{write,read}_hdf5_scalar_attribute() are implemented in this way.



ISSUE 4: STANDARD API for COMPLEX TYPES
---------------------------------------

  It is quite common to use complex<float> or complex<double> in mathematical 
calculations so it would be nice to have predefined datatypes for these. 
Since FORTRAN, C99 and C++ all support complex with up-to long double 
precision at the language level, HDF5-support would make life so much easier.


ISSUE 5: H5File API
------------------------

1.    Is there a requirement for CommonFG to be a base-class at all? Can't all 
included operations be collapsed into just the Group class? To do this with a 
file object, just retrieve the root group using file.openGroup("/") and then 
work simply with groups. To annotate the H5File itself with meta-info, 
provide a separate API. Class hierarchies should represent meaningful 
relationships between parents and progeny. The root group in a file is not 
the file itself and CommonFG is required only when we mix up the two 
definitions (IIUC, IMHO).

2.    The H5File contructor supports some H5F_ACC_? parameters that 
H5File::open() fails with. This is not documented in the DOXYGEN-generated 
API. This is forcing me to include a whole bunch of code within a try-catch 
block simply because the H5File object must now be created inside the block 
instead of simply using the open() member function - and is therefore visible 
only inside the try-catch block!

   IMHO, H5File should follow a model similar to ifstream and ofstream for the 
open() and close() functions - while a constructor performs an open(), the 
latter can also be performed separately with the same H5F_ACC_? flags.


Thanks,
Manoj Rajagopalan
PhD Candidate, EECS (CSE)
University of Michigan, Ann Arbor
#include "hdf5-utils.h"

using namespace H5;
using namespace std;


template<> DataType const& hdf5_datatype_traits<char>::dataType(PredType::NATIVE_CHAR);
template<> DataType const& hdf5_datatype_traits<unsigned char>::dataType(PredType::NATIVE_UCHAR);
template<> DataType const& hdf5_datatype_traits<int>::dataType(PredType::NATIVE_INT);
template<> DataType const& hdf5_datatype_traits<unsigned int>::dataType(PredType::NATIVE_UINT);
template<> DataType const& hdf5_datatype_traits<float>::dataType(PredType::NATIVE_FLOAT);
template<> DataType const& hdf5_datatype_traits<double>::dataType(PredType::NATIVE_DOUBLE);
template<> DataType const& hdf5_datatype_traits<long double>::dataType(PredType::NATIVE_LDOUBLE);

// These decls must follow the above so that static initialization proceeds in order
H5NativeComplex<float> NATIVE_COMPLEXFLOAT;
H5NativeComplex<double> NATIVE_COMPLEXDOUBLE;
H5NativeComplex<long double> NATIVE_COMPLEXLDOUBLE;

template<> DataType const& hdf5_datatype_traits<complex<float> >::dataType(NATIVE_COMPLEXFLOAT);
template<> DataType const& hdf5_datatype_traits<complex<double> >::dataType(NATIVE_COMPLEXDOUBLE);
template<> DataType const& hdf5_datatype_traits<complex<long double> >::dataType(NATIVE_COMPLEXLDOUBLE);


template<>
void write_hdf5_scalar_attribute(H5::Group & group,
                                 std::string const& name,
                                 std::string const& value)
{
	StrType strType(0, value.length());
	Attribute attr = group.createAttribute(name,
	                                       strType,
	                                       DataSpace(H5S_SCALAR));
	attr.write(strType, reinterpret_cast<void const *>(value.c_str()));
	
}

template<>
string read_hdf5_scalar_attribute(Group & group,
                                  string const& name)
{
	Attribute attr = group.openAttribute(name);
	StrType strType = attr.getStrType();
	string value(strType.getSize()/sizeof(char)+1, '\0');
	attr.read(strType, reinterpret_cast<void*>(&value[0]));
	return value;
}


#ifndef __HDF5_UTILS_H__
#define __HDF5_UTILS_H__

#include <H5Cpp.h>
#include <complex>

template<typename T> struct hdf5_datatype_traits {
	static H5::DataType const& dataType;
};

template<typename T>
class H5NativeComplex : public H5::CompType
{
public:
	H5NativeComplex() : CompType(sizeof(std::complex<T>)) {
		using namespace H5;
		DataType const& dataType = hdf5_datatype_traits<T>::dataType;
		insertMember(std::string("real"), 0, dataType);
		insertMember(std::string("imag"), dataType.getSize(), dataType);
		pack();
		lock();
	}
};

// complex data types
extern H5NativeComplex<float> NATIVE_COMPLEXFLOAT;
extern H5NativeComplex<double> NATIVE_COMPLEXDOUBLE;
extern H5NativeComplex<long double> NATIVE_COMPLEXLDOUBLE;


template<typename T>
void write_hdf5_scalar_attribute(H5::Group & group,
                                 std::string const& name,
                                 T const& value);

template<typename T>
void write_hdf5_scalar_attribute(H5::Group & group,
                                 std::string const& name,
                                 T const& value)
{
	using namespace H5;
	DataType const& dataType = hdf5_datatype_traits<T>::dataType;
	Attribute attr = group.createAttribute(name,
	                                       dataType,
	                                       DataSpace(H5S_SCALAR));
	attr.write(dataType, reinterpret_cast<void const *>(&value));
	
}

// explicit prototypes for polymorphic definitions
template<>
extern
void write_hdf5_scalar_attribute(H5::Group & group,
                                 std::string const& name,
                                 std::string const& value);


template<typename T>
T read_hdf5_scalar_attribute(H5::Group & group,
                             std::string const& name)
{
	using namespace H5;
	DataType const& dataType = hdf5_datatype_traits<T>::dataType;
	Attribute attr = group.openAttribute(name);
	T value;
	attr.read(dataType, reinterpret_cast<void*>(&value));
	return value;
}

template<>
extern
std::string read_hdf5_scalar_attribute(H5::Group & group,
                                       std::string const& name);


#endif // __HDF5_UTILS_H__
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to