Yes I think hyperslabs would be an essential tool for us.

Our hyperslabs would essentially be just single rows.  That worries me a
little how it would perform but I should just try it and see.  We need to be
able to extend the dataset so we need chunking just for that, if nothing
else.

As per my other email I am worried maybe reading/writing single rows is not
a good fit for HDF5?  But again I should really just experiment and see.
 Thanks.

-Philip



On Mon, Dec 6, 2010 at 7:03 PM, Werner Benger <[email protected]> wrote:

> Hi Phillip,
>
>  it sounds as if using Hyperslabs would do what you need, see here for
> instance:
>
> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5S.html#Dataspace-SelectHyperslab
>
> Hyperslabs allow you to only read a subset of a dataset, and thus allows to
> iterate over memory-fitting parts of a dataset which by itself is larger
> than available RAM.
>
> At this point it is not relevant if the dataset is chunked or not, or how
> large the chunks are, but the chunking may influence the performance
> significantly. Once you have a working system using hyperslabs on an
> unchunked dataset, you would want to play with internal chunk parameters to
> investigate performance.
>
> Hyperslabs are good for n-dimensional datasets. Would that address your
> needs?
>
>          Werner
>
>
> On Tue, 07 Dec 2010 00:57:45 +0100, Philip Winston <[email protected]>
> wrote:
>
> Thanks for the info and code!
>
> Given this mmap VFD isn't yet part of the library, I'm wondering does
> anyone do what we're talking today, with the existing HDF5 library?  To
> summarize we have a dataset that doesn't fit in memory.  We want to
> "randomly" perform reads, reading only portions into RAM.  Then we make
> changes in  RAM.  Then we want to write out only the changed portions.
>
> I'm guessing a chunked file is the starting point here, but what else is
> needed?  Is there a layer on top to coordinate things?  To hold a list of
> modified chunks?
>
> Is it even a good idea to attempt this usage model with HDF5? I read one
> person suggest using HDF5 is good for bulk read-only data but that he would
> use a database for "complex" data that requires changes.  I wonder our
> situation is just better suited to a database?
>
> Where do people draw the line?  What do you consider appropriate usage
> model for HDF5 vs. a database or something else?  Thanks for any input we
> have "adopted" HDF5 but really we don't understand it that well yet.
>
> -Philip
>
>
>
> On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <[email protected]> wrote:
>
>> I am not sure if you got an answer to this email and so I thought I
>> would pipe up.
>>
>> Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual File
>> Driver (VFD) and tweeked it to use mmap instead just to test how
>> something like this would work. I've attached the (hacked) code. To use
>> it, you are going to have to learn a bit about HDF5 VFDs. Learn about
>> them in File Access Property lists,
>> http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
>>
>> http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
>>
>>
>> It is something to start with. I don't know if HDF5 has plans for
>> writing an mmap based VFD but they really ought to and it is something
>> that is definitely lacking from their supported VFDs currently.
>>
>> Mark
>>
>> On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
>> > We just added HDF5 support in our application.  We are using the C
>> > API. Our datasets are 1D and 2D arrays of integers, a pretty simple
>> > structure on disk. Today we have about 5GB of data and we load the
>> > whole thing into RAM, do somewhat random reads, make changes, then
>> > overwrite the old .h5 file.
>> >
>> > I only learned a very minimum amount of the HDF5 API to accomplish the
>> > above, and it was pretty easy.  Now we are looking at supporting much
>> > larger datasets, such that it will no longer be practical to have the
>> > whole thing in memory.  This is where I'm confused on exactly what
>> > HDF5 offers vs. what is up to the application, and on what's the best
>> > way to do things in the application.
>> >
>> > Ideally in my mind what I want is an mmap like interface, just a raw
>> > pointer which "magically" pages stuff off disk in response to reads,
>> > and writes stuff back to disk in response to writes.  Does HDF5 have
>> > something like this, or can/do people end up writing something like
>> > this on top of HDF5?  Today our datasets our contiguous and I assuming
>> > we'd want chunked datasets instead, but it's not clear to me how much
>> > "paging" functionality chunked buys you and how much you have to
>> > implement.
>> >
>> > Thanks for any ideas or pointers.
>> >
>> > -Philip
>> --
>> Mark C. Miller, Lawrence Livermore National Laboratory
>> ================!!LLNL BUSINESS ONLY!!================
>> [email protected]      urgent: [email protected]
>> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
>>
>> _______________________________________________
>> Hdf-forum is for HDF software users discussion.
>> [email protected]
>> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
>>
>>
>
>
>
> --
> ___________________________________________________________________________
> Dr. Werner Benger Visualization Research
> Laboratory for Creative Arts and Technology (LCAT)
> Center for Computation & Technology at Louisiana State University (CCT/LSU)
> 211 Johnston Hall, Baton Rouge, Louisiana 70803
> Tel.: +1 225 578 4809 Fax.: +1 225 578-5362
>
_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Reply via email to