Tim, Happy New Year! I'm not aware of any comparative study. 
(It'd be comparing apples and oranges: HDF5 is a smart data container.
XML is a document/message format.) Please add it to the Mendeley HDF group
(http://www.mendeley.com/groups/3317921/hdf/papers/) if you happen to come
across something.

Have you considered a hybrid approach, e.g., XDMF or SDCubes?

http://www.mendeley.com/catalog/enhancements-extensible-data-model-format-xdmf/

http://www.mendeley.com/catalog/adaptive-informatics-multifactorial-highcontent-biological-data/

My main concern would be that a pure XML approach will force you to
reinvent (and maintain!) a lot of infrastructure in XML that's built into HDF5
and that's transparent to end users: Not only will it not perform at the level 
HDF5 does,
it'll also confuse your users. E.g., using base64 encoded, compressed binary 
values is ok,
as long as you always want to decompress the entire value and not just
subsets of it. Would you really want to mimic chunking/tiling in XML?

Best, G.

-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On Behalf Of Tim
Sent: Tuesday, December 31, 2013 5:06 PM
To: HDF Forum
Subject: [Hdf-forum] HDF5 vs. XML

We are trying to better understand the relative merits of using XML or
HDF5 file formats for a new project. Does anyone know of papers and/or studies, 
either qualitatively or quantitatively, that looked at parameters that might 
affect such a decision?

The project needs to store equipment sensor data covering specified time 
periods along with metadata about the data and equipment. There will be many 
1000's of files which may contain binary data and matrices.

XML is the default selection, chiefly because it is ubiquitous and there is a 
rich toolset supporting it. This translates directly to lower development and 
maintenance costs. But, as the file size and binary data and number of matrices 
increase, XML becomes less efficient to work with.

NOTE 1: because XML can be compressed resulting in much smaller file sizes, for 
purposes of our investigation, we are considering compressed XML as a different 
file format, cXML.

NOTE 2: we plan to use BASE64 encoding for XML binary data.

Parameters we feel are important include:

1. Time to create the files.
2. File sizes.
3. Time to read the files.

Our plan is to generate fictitious but representative data files of various 
sizes, amounts of binary data and matrices, and record the above parameters. 
Then, mapping this information to our use cases, should result in us having 
usable empirical data with which to make a better informed decision regarding 
file formats.

The above study also provides us some insight into the technical issues related 
to supporting a HDF5 capability, which will need to be factored in.

Comments/thoughts on the above are appreciated.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org

Reply via email to