Re: [Hdf-forum] Many datasets in an HDF5 file

Miller, Mark C. Wed, 24 Feb 2016 12:59:46 -0800

See comments embedded below. . .

From: Hdf-forum 
<[email protected]<mailto:[email protected]>>
 on behalf of "Dogrul, Can@DWR" 
<[email protected]<mailto:[email protected]>>
Reply-To: HDF Users Discussion List 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, February 24, 2016 7:55 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: [Hdf-forum] Many datasets in an HDF5 file



Unsubscribe

It appears that you have subscribed to commercial messages from this sender. To 
stop receiving such messages from this sender, please 
unsubscribe<http://secure-web.cisco.com/1A1STJFlrYW3pTYGjGkMV3NVABn_SrVUt5GFlSSYKh0vKy2xTGJoDuUy9XGiioXFcjwRNGD9_HamR1Ppz6cfaxQdbq9cFewBg5CY3laSbxSX80zDDBTeENu7Y-nmdENs5EuE3QjYK1LT1PvSrZCDTtBy_6HotIJ3uPcB_4RmfqhvAOm91vuZ6vFsxYvohgxn9GD8i-i3KcIsdGU9m4CHeDmLoRYFZaBN96BG3v9612CNzdHREayXFC2-cQIvRhfVYVU7heOSIBUI4Befjnnoi8t096qlCk9dCwE-m9w6fsDLTd-uPUIMs_jmDtKGDxepyrEqk0u7K2Bwyyv8Nok5_zDGyl14SR1FPh4xaPShhfy__Ikr1v6hf4ZBqM6Rz3DNDjz1U-bAmK7O0Yqv-vG7Pl3Dy_6k_u6rIRWva7QBIqSn3cgVRDJTPVzssHgLHyXdgiRzIJ7FHpYrIVqsNPNohU-V-gV2_8zjF77GYygwAP152r09iqc83Su1hJ0B0vBTda1bBOhrWg3H28FiYdM2Q7Q/l70%3Ahttp%3A%2F%2Flists.hdfgroup.org%2Fmailman%2Foptions%2Fhdf-forum_lists.hdfgroup.org63%3Amailto%3Ahdf-forum-request%40lists.hdfgroup.org%3Fsubject%3Dunsubscribee>
Hello,

I am planning to store potentially tens of thousands of 2-D datasets along with 
their relevant attributes to an HDF5 file. Each dataset will have the same 
number of rows but different number of columns. I am rather new to HDF so I 
thought I’d ask about any potential pitfalls before diving into coding. Are 
there any memory or performance issues I should be concerned about due to the 
large number of datasets being dealt with?

If its practical, I think you would want to try to distribute the datasets 
among several groups in a group hierarchy of some modest depth, maybe 2-6 
depending on dataset count. Putting all datasets in a single group is probably 
not the best approach as it leads to a rather large single structure necessary 
to manage all the members of that group.

And what about the file size? Currently, the data is stored in native Fortran 
sequential binary file format. The file sizes range from tens of GBytes to over 
100 GBytes depending on the application they are generated from. Should I 
expect a file size that is much larger than its Fortran binary counterpart or 
about the same size?

That depends. Is this a *lot* of tiny datasets or a lot of large-ish datasets? 
I think dataset header overheads are on the order of 1/2 kilobyte. Its worse if 
you chunk the datasets (e.g. use H5P_CHUNK storage mode when you create the 
datasets). If your average dataset size is say 20x that (e.g. >= 10Kb), then I 
think the file size difference will NOT be significant.

Hope that helps.



Any information would be greatly appreciated.

Thanks,
Jon


**************************************************
Emin C. Dogrul, Ph.D., P.E.
Water Resources Engineer
Hydrologic Models Development Unit

California Department of Water Resources Bay-Delta Office
1416 9th Street, Rm 252A
Sacramento, CA 95814

Phone: (916) 654 7018
Fax: (916) 653 6077
e-mail: [email protected]<mailto:[email protected]>
**************************************************

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://lists.hdfgroup.org/mailman/listinfo/hdf-forum_lists.hdfgroup.org
Twitter: https://twitter.com/hdf5

Re: [Hdf-forum] Many datasets in an HDF5 file

Reply via email to