Re: [Hdf-forum] paging approaches

Quincey Koziol Mon, 06 Dec 2010 19:10:35 -0800

Hi Philip,
        Have you considered using the 'core' file driver (H5Pset_fapl_core)?


        Quincey

On Dec 6, 2010, at 6:52 PM, Philip Winston wrote:

> I mean the code I gave you mmaps the file as a
> whole, not individual datasets in the file. But, it nonetheless mmaps
> UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made
> by the application. So, I am thinking this is nowhere near the paradigm
> you were hoping for.
> 
> I was hoping for a true mmap model.  But now I see perhaps that is 
> impossible.  mmap only works if what is in memory is identical to what's on 
> disk,  for HDF5 endianness alone can break this assumption right?  Plus lots 
> of other things like chunked datasets.
> 
> So for my situation one option is keep HDF5 around for interchange, but for 
> runtime "optimize" to a simple binary format where I can mmap the entire 
> dataset.  Then I can just read/write anywhere and the OS takes care of 
> everything.  
> 
> It's tempting to me, coming from a situation where everything is in RAM 
> today, it seems like the least work to continue to access randomly and let 
> the OS figured it out. But I don't know how smart it is.  Maybe it is kind of 
> a red herring, like that would work but it would perform horribly.  Maybe 
> coming from a situation where everything is in RAM, we have to rethink things 
> a lot to make it work off disk, to organize stuff for coherence, so we can 
> read big chunks instead of single rows.
> 
> My experience is that for simple queries (give me this hyperslab of
> data), products like HDF5 are going to give better I/O performance than
> some RDBMS. But, if you are really talking about highly sophisticated
> queries where future reads/writes depend upon other parts of the query
> and the datasets being queried, that sounds more like an RDBMS than an
> I/O library sort of thing. Just my two cents. Good luck.
> 
> Our data is essentially a tabular representation of a tree.  Every row is a 
> node in the tree.  There are 2-10 values in a row, but tens of millions of 
> rows.  So in a sense our queries do depend on values as we read them, because 
> for example we'll read a value, find the children of a node, read those 
> values, etc. etc. 
> 
> I imagine HDF5 being best for reading large amounts of data each time.  We 
> would generally always be reading 1 row at a time.  Set up one hyperslab, 
> tiny read, new hyperslab, tiny read.
> 
> We have other uses in mind for HDF5 but this particular type of a data I 
> wonder, maybe it's just not a good fit.
> 
> -Philip
> 
> 
> 
> > On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <[email protected]> wrote:
> >         I am not sure if you got an answer to this email and so I
> >         thought I
> >         would pipe up.
> >
> >         Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual
> >         File
> >         Driver (VFD) and tweeked it to use mmap instead just to test
> >         how
> >         something like this would work. I've attached the (hacked)
> >         code. To use
> >         it, you are going to have to learn a bit about HDF5 VFDs.
> >         Learn about
> >         them in File Access Property lists,
> >         http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as
> >
> >         http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html
> >
> >
> >         It is something to start with. I don't know if HDF5 has plans
> >         for
> >         writing an mmap based VFD but they really ought to and it is
> >         something
> >         that is definitely lacking from their supported VFDs
> >         currently.
> >
> >         Mark
> >
> >         On Fri, 2010-12-03 at 17:02, Philip Winston wrote:
> >         > We just added HDF5 support in our application.  We are using
> >         the C
> >         > API. Our datasets are 1D and 2D arrays of integers, a pretty
> >         simple
> >         > structure on disk. Today we have about 5GB of data and we
> >         load the
> >         > whole thing into RAM, do somewhat random reads, make
> >         changes, then
> >         > overwrite the old .h5 file.
> >         >
> >         > I only learned a very minimum amount of the HDF5 API to
> >         accomplish the
> >         > above, and it was pretty easy.  Now we are looking at
> >         supporting much
> >         > larger datasets, such that it will no longer be practical to
> >         have the
> >         > whole thing in memory.  This is where I'm confused on
> >         exactly what
> >         > HDF5 offers vs. what is up to the application, and on what's
> >         the best
> >         > way to do things in the application.
> >         >
> >         > Ideally in my mind what I want is an mmap like interface,
> >         just a raw
> >         > pointer which "magically" pages stuff off disk in response
> >         to reads,
> >         > and writes stuff back to disk in response to writes.  Does
> >         HDF5 have
> >         > something like this, or can/do people end up writing
> >         something like
> >         > this on top of HDF5?  Today our datasets our contiguous and
> >         I assuming
> >         > we'd want chunked datasets instead, but it's not clear to me
> >         how much
> >         > "paging" functionality chunked buys you and how much you
> >         have to
> >         > implement.
> >         >
> >         > Thanks for any ideas or pointers.
> >         >
> >         > -Philip
> >
> >         --
> >         Mark C. Miller, Lawrence Livermore National Laboratory
> >         ================!!LLNL BUSINESS ONLY!!================
> >         [email protected]      urgent: [email protected]
> >         T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
> >
> >         _______________________________________________
> >         Hdf-forum is for HDF software users discussion.
> >         [email protected]
> >         http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> >
> --
> Mark C. Miller, Lawrence Livermore National Laboratory
> ================!!LLNL BUSINESS ONLY!!================
> [email protected]      urgent: [email protected]
> T:8-6 (925)-423-5901    M/W/Th:7-12,2-7 (530)-753-8511
> 
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
> 
> _______________________________________________
> Hdf-forum is for HDF software users discussion.
> [email protected]
> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

_______________________________________________
Hdf-forum is for HDF software users discussion.
[email protected]
http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org

Re: [Hdf-forum] paging approaches

Reply via email to