Re: [Hdf-forum] Working with lots of HDF5 files

Dana Robinson Tue, 26 Jan 2016 10:08:20 -0800

Some Windows comments:

* SWMR is not well-tested on Windows since the current SWMR test harness is 
based on shell scripts and makes use of fork(). There's no reason why it 
shouldn't work on Windows, though. It's just not tested there at this time. 
We'll be trying to add at least some minimal testing in the near future, but it 
might be a little bit before we have a full test suite.


* NTFS and parallel file systems like GPFS should support SWMR. SMB-style 
network access (e.g.: Windows file shares) will NOT support SWMR, however, 
since we can't guarantee write ordering. This is not unlike NFS, which is also 
not supported for SWMR access.

Also, if you are using HDF5 1.10.0, be sure to use H5Pset_libver_bounds() to 
use the latest file format. The newer data structures are much more efficient 
than the backward-compatible defaults. You'll lose HDF5 1.8 compatibility, 
though, so keep that in mind.

https://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html#Property-SetLibverBounds

Dana Robinson
Software Engineer
The HDF Group

From: Hdf-forum [mailto:[email protected]] On Behalf Of 
SOLTYS Radoslaw
Sent: Tuesday, January 26, 2016 5:56 AM
To: [email protected]
Subject: [Hdf-forum] Working with lots of HDF5 files

We're looking into replacing our custom storage design for time-series data 
with HDF5 and we're looking mainly at HDF5 version 1.10 for the SWMR capability 
as we're doing this already with our custom storage.
To find out the best layout - we drafted a few test cases and started off a 
tutorial code sample in C++, adjusting it to replicate our current database 
structure, being one file per signal - so we are creating new empty files in a 
loop - and there we already ran into problems:

-       the HDF5 garbage collector allocates lots of memory as soon as files 
are created - we tried to tune it with setGcReferences(), but could not;

-       having reached 2GB - the HDF5 create function throws the exception "no 
space available for allocation" (We're running 64-bit Windows 8 with 16GB of 
RAM)
I'd have a few questions at this point:

-       Can we reduce the amount of memory used by the garbage collector? If 
yes - how?

-       Taking a step back: is the HDF5 API designed to handle thousands of 
files in practice?

-       Or would it be better to have a single file with the same number of 
datasets in it? (We're talking about a few thousand datasets, each with several 
million rows.)

Thanks for your kind support


________________________________
CONFIDENTIALITY : This e-mail and any attachments are confidential and may be 
privileged. If you are not a named recipient, please notify the sender 
immediately and do not disclose the contents to another person, use it for any 
purpose or store or copy the information in any medium.

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Working with lots of HDF5 files

Reply via email to