|
Hi Phillip,
it sounds as if using Hyperslabs would do what you need, see here for instance:
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab
Hyperslabs allow you to only read a subset of a dataset, and thus allows to iterate over memory-fitting parts of a dataset which by itself is larger than available RAM.
At this point it is not relevant if the dataset is chunked or not, or how large the chunks are, but the chunking may influence the performance significantly. Once you have a working system using hyperslabs on an unchunked dataset, you would want to play with internal chunk parameters to investigate performance.
Hyperslabs are good for n-dimensional datasets. Would that address your needs?
Werner
On Tue, 07 Dec 2010 00:57:45 +0100, Philip Winston < [email protected]> wrote: Thanks for the info and code!
Given this mmap VFD isn't yet part of the library, I'm wondering does anyone do what we're talking today, with the existing HDF5 library? To summarize we have a dataset that doesn't fit in memory. We want to "randomly" perform reads, reading only portions into RAM. Then we make changes in RAM. Then we want to write out only the changed portions.
I'm guessing a chunked file is the starting point here, but what else is needed? Is there a layer on top to coordinate things? To hold a list of modified chunks?
Is it even a good idea to attempt this usage model with HDF5? I read one person suggest using HDF5 is good for bulk read-only data but that he would use a database for "complex" data that requires changes. I wonder our situation is just better suited to a database?
Where do people draw the line? What do you consider appropriate usage model for HDF5 vs. a database or something else? Thanks for any input we have "adopted" HDF5 but really we don't understand it that well yet.
-Philip
On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <[email protected]> wrote:
I am not sure if you got an answer to this email and so I thought I
would pipe up.
Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
Driver (VFD) and tweeked it to use mmap instead just to test how
something like this would work. I've attached the (hacked) code. To use
it, you are going to have to learn a bit about HDF5 VFDs. Learn about
them in File Access Property lists,
http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
It is something to start with. I don't know if HDF5 has plans for
writing an mmap based VFD but they really ought to and it is something
that is definitely lacking from their supported VFDs currently.
Mark
On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> We just added HDF5 support in our application. We are using the C
> API. Our datasets are 1D and 2D arrays of integers, a pretty simple
> structure on disk. Today we have about 5GB of data and we load the
> whole thing into RAM, do somewhat random reads, make changes, then
> overwrite the old .h5 file.
>
> I only learned a very minimum amount of the HDF5 API to accomplish the
> above, and it was pretty easy. Now we are looking at supporting much
> larger datasets, such that it will no longer be practical to have the
> whole thing in memory. This is where I'm confused on exactly what
> HDF5 offers vs. what is up to the application, and on what's the best
> way to do things in the application.
>
> Ideally in my mind what I want is an mmap like interface, just a raw
> pointer which "magically" pages stuff off disk in response to reads,
> and writes stuff back to disk in response to writes. Does HDF5 have
> something like this, or can/do people end up writing something like
> this on top of HDF5? Today our datasets our contiguous and I assuming
> we'd want chunked datasets instead, but it's not clear to me how much
> "paging" functionality chunked buys you and how much you have to
> implement.
>
> Thanks for any ideas or pointers.
>
> -Philip
--
Mark C. Miller, Lawrence Livermore National Laboratory
================!!LLNL BUSINESS ONLY!!================
[email protected] urgent: [email protected]
T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
-- ___________________________________________________________________________ Dr. Werner Benger Visualization Research Laboratory for Creative Arts and Technology (LCAT) Center for Computation & Technology at Louisiana State University (CCT/LSU) 211 Johnston Hall, Baton Rouge, Louisiana 70803 Tel.: +1 225 578 4809 Fax.: +1 225 578-5362 |