Hi,
sorry for the late reply, but I have been on holiday. The link for NeXus
is www.nexusformat.org
NeXus started using HDF-4, added HDFD-5 when it became available and XML
on request of people
who wish to edit their data in emacs...... Nowadays we focus on HDF-5.
For your use case: it depends
on the size of the arrays. With the NeXus API we can handle something
like 400x2000 and probably
up to 10x more without problems. But there is a caveat: in order to get
there we hacked the XML-parser.
And this is the general thing with XML: it was never intended for array
data. Everything you do for array
data in XML has to be maintained by you. Thus, if your focus is arrays
you are better off with HDF-5.
C, F77, Java comes out of the box, python works nicely, you can load
HDF-5 into matlab with a single call...
Best Regards,
Mark Koennecke, for the NeXus International Advisory Committee
On 01/03/2014 02:32 AM, Ted Habermann wrote:
This might be a better link on the NeXus XML:
http://download.nexusformat.org/doc/html/nxdl.html
On Jan 2, 2014, at 6:07 PM, Ted Habermann <[email protected]
<mailto:[email protected]>> wrote:
Tim,
I would agree with Gerd that this comparison is a bit of apples and
oranges…
I do a lot of XML and, in fact, many people consider me to be an XML
zealot, so I would agree that there are a lot of tools out there in
XML Land. However, I am not familiar with any tools for dealing with
binary data packed in XML (they may be there, but I am not familiar
with them). The “available tools” point is, therefore, a bit hard to
understand in this context…
You mention compression of XML. Gerd is correct that this is whole
file compression. You need to uncompress the whole file in order to
do anything with it. The compression approach used in HDF is much
more intelligent. It compresses different datasets in the file
independently and uncompresses only what you need. This optimizes
file sizes and access speeds.
You also mention 1000’s of files. HDF would almost certainly give you
many more aggregation options than XML with groups and potentially
virtual datasets that provide an access framework for groups of files…
XML is really great for metadata and we are doing quite a bit of work
with XML representations of the metadata in HDF files. This involves
an HDF tool for extracting the metadata in XML for processing
independent of the data. Gerd mentioned a couple of similar projects.
I would add Nexus, which is doing quite a bit with XML and HDF (see
http://download.nexusformat.org/doc/html/design.html and other
related pages)…
Jim Collins has written about the “Tyranny of the Or” where
organizations decide between X and Y. This contrasts with the “Power
of the And”. I would encourage you to think about how XML and HDF can
most effectively be used together rather than trying to choose
between them…
Ted
By the way, you mentioned that you are storing sensor data. I worked
with many sensor projects in NOAA and am curious about whether you
are considering sensorML
(http://www.opengeospatial.org/standards/sensorml) for your metadata.
<SignatureSm2.png>
On Jan 2, 2014, at 8:53 AM, Gerd Heber <[email protected]
<mailto:[email protected]>> wrote:
Tim, Happy New Year! I'm not aware of any comparative study.
(It'd be comparing apples and oranges: HDF5 is a smart data container.
XML is a document/message format.) Please add it to the Mendeley HDF
group
(http://www.mendeley.com/groups/3317921/hdf/papers/) if you happen
to come
across something.
Have you considered a hybrid approach, e.g., XDMF or SDCubes?
http://www.mendeley.com/catalog/enhancements-extensible-data-model-format-xdmf/
http://www.mendeley.com/catalog/adaptive-informatics-multifactorial-highcontent-biological-data/
My main concern would be that a pure XML approach will force you to
reinvent (and maintain!) a lot of infrastructure in XML that's built
into HDF5
and that's transparent to end users: Not only will it not perform at
the level HDF5 does,
it'll also confuse your users. E.g., using base64 encoded,
compressed binary values is ok,
as long as you always want to decompress the entire value and not just
subsets of it. Would you really want to mimic chunking/tiling in XML?
Best, G.
-----Original Message-----
From: Hdf-forum [mailto:[email protected]] On
Behalf Of Tim
Sent: Tuesday, December 31, 2013 5:06 PM
To: HDF Forum
Subject: [Hdf-forum] HDF5 vs. XML
We are trying to better understand the relative merits of using XML or
HDF5 file formats for a new project. Does anyone know of papers
and/or studies, either qualitatively or quantitatively, that looked
at parameters that might affect such a decision?
The project needs to store equipment sensor data covering specified
time periods along with metadata about the data and equipment. There
will be many 1000's of files which may contain binary data and matrices.
XML is the default selection, chiefly because it is ubiquitous and
there is a rich toolset supporting it. This translates directly to
lower development and maintenance costs. But, as the file size and
binary data and number of matrices increase, XML becomes less
efficient to work with.
NOTE 1: because XML can be compressed resulting in much smaller file
sizes, for purposes of our investigation, we are considering
compressed XML as a different file format, cXML.
NOTE 2: we plan to use BASE64 encoding for XML binary data.
Parameters we feel are important include:
1. Time to create the files.
2. File sizes.
3. Time to read the files.
Our plan is to generate fictitious but representative data files of
various sizes, amounts of binary data and matrices, and record the
above parameters. Then, mapping this information to our use cases,
should result in us having usable empirical data with which to make
a better informed decision regarding file formats.
The above study also provides us some insight into the technical
issues related to supporting a HDF5 capability, which will need to
be factored in.
Comments/thoughts on the above are appreciated.
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org