I had not looked at the core driver, thanks. It seems like a useful thing to be aware of in general but I don't think it helps in my case. It sounds like it is useful mainly for writing, writing an HDF5 in memory.
But if you have a big HDF5 on disk, I don't see how the core driver helps you access it. You could copy the whole thing to an in-memory file but we don't want a big startup hit like that. But maybe I am missing a way to use the core driver here. -Philip On Mon, Dec 6, 2010 at 10:08 PM, Quincey Koziol <[email protected]> wrote: > Hi Philip, > Have you considered using the 'core' file driver (H5Pset_fapl_core)? > > Quincey > > On Dec 6, 2010, at 6:52 PM, Philip Winston wrote: > > I mean the code I gave you mmaps the file as a >> whole, not individual datasets in the file. But, it nonetheless mmaps >> UNDERNEATH the explicit reads/writes (e.g. H5Dread/H5Dwrite calls) made >> by the application. So, I am thinking this is nowhere near the paradigm >> you were hoping for. >> > > I was hoping for a true mmap model. But now I see perhaps that is > impossible. mmap only works if what is in memory is identical to what's on > disk, for HDF5 endianness alone can break this assumption right? Plus lots > of other things like chunked datasets. > > So for my situation one option is keep HDF5 around for interchange, but for > runtime "optimize" to a simple binary format where I can mmap the entire > dataset. Then I can just read/write anywhere and the OS takes care of > everything. > > It's tempting to me, coming from a situation where everything is in RAM > today, it seems like the least work to continue to access randomly and let > the OS figured it out. But I don't know how smart it is. Maybe it is kind > of a red herring, like that would work but it would perform horribly. Maybe > coming from a situation where everything is in RAM, we have to rethink > things a lot to make it work off disk, to organize stuff for coherence, so > we can read big chunks instead of single rows. > > My experience is that for simple queries (give me this hyperslab of >> data), products like HDF5 are going to give better I/O performance than >> some RDBMS. But, if you are really talking about highly sophisticated >> queries where future reads/writes depend upon other parts of the query >> and the datasets being queried, that sounds more like an RDBMS than an >> I/O library sort of thing. Just my two cents. Good luck. >> > > Our data is essentially a tabular representation of a tree. Every row is a > node in the tree. There are 2-10 values in a row, but tens of millions of > rows. So in a sense our queries do depend on values as we read them, > because for example we'll read a value, find the children of a node, read > those values, etc. etc. > > I imagine HDF5 being best for reading large amounts of data each time. We > would generally always be reading 1 row at a time. Set up one hyperslab, > tiny read, new hyperslab, tiny read. > > We have other uses in mind for HDF5 but this particular type of a data I > wonder, maybe it's just not a good fit. > > -Philip > > > > > On Mon, Dec 6, 2010 at 3:21 PM, Mark Miller <[email protected]> wrote: >> > I am not sure if you got an answer to this email and so I >> > thought I >> > would pipe up. >> > >> > Yes, you can do mmap if you'd like. I took HDF5's sec2 Virtual >> > File >> > Driver (VFD) and tweeked it to use mmap instead just to test >> > how >> > something like this would work. I've attached the (hacked) >> > code. To use >> > it, you are going to have to learn a bit about HDF5 VFDs. >> > Learn about >> > them in File Access Property lists, >> > http://www.hdfgroup.org/HDF5/doc/RM/RM_H5P.html, as well as >> > >> > http://www.hdfgroup.org/HDF5/doc/TechNotes/VFL.html >> > >> > >> > It is something to start with. I don't know if HDF5 has plans >> > for >> > writing an mmap based VFD but they really ought to and it is >> > something >> > that is definitely lacking from their supported VFDs >> > currently. >> > >> > Mark >> > >> > On Fri, 2010-12-03 at 17:02, Philip Winston wrote: >> > > We just added HDF5 support in our application. We are using >> > the C >> > > API. Our datasets are 1D and 2D arrays of integers, a pretty >> > simple >> > > structure on disk. Today we have about 5GB of data and we >> > load the >> > > whole thing into RAM, do somewhat random reads, make >> > changes, then >> > > overwrite the old .h5 file. >> > > >> > > I only learned a very minimum amount of the HDF5 API to >> > accomplish the >> > > above, and it was pretty easy. Now we are looking at >> > supporting much >> > > larger datasets, such that it will no longer be practical to >> > have the >> > > whole thing in memory. This is where I'm confused on >> > exactly what >> > > HDF5 offers vs. what is up to the application, and on what's >> > the best >> > > way to do things in the application. >> > > >> > > Ideally in my mind what I want is an mmap like interface, >> > just a raw >> > > pointer which "magically" pages stuff off disk in response >> > to reads, >> > > and writes stuff back to disk in response to writes. Does >> > HDF5 have >> > > something like this, or can/do people end up writing >> > something like >> > > this on top of HDF5? Today our datasets our contiguous and >> > I assuming >> > > we'd want chunked datasets instead, but it's not clear to me >> > how much >> > > "paging" functionality chunked buys you and how much you >> > have to >> > > implement. >> > > >> > > Thanks for any ideas or pointers. >> > > >> > > -Philip >> > >> > -- >> > Mark C. Miller, Lawrence Livermore National Laboratory >> > ================!!LLNL BUSINESS ONLY!!================ >> > [email protected] urgent: [email protected] >> > T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 >> > >> > _______________________________________________ >> > Hdf-forum is for HDF software users discussion. >> > [email protected] >> > >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> > >> -- >> Mark C. Miller, Lawrence Livermore National Laboratory >> ================!!LLNL BUSINESS ONLY!!================ >> [email protected] urgent: [email protected] >> T:8-6 (925)-423-5901 M/W/Th:7-12,2-7 (530)-753-8511 >> >> >> _______________________________________________ >> Hdf-forum is for HDF software users discussion. >> [email protected] >> http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org >> > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > > > > _______________________________________________ > Hdf-forum is for HDF software users discussion. > [email protected] > http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org > >
_______________________________________________ Hdf-forum is for HDF software users discussion. [email protected] http://mail.hdfgroup.org/mailman/listinfo/hdf-forum_hdfgroup.org
