Maybe we can put together our requested IO operations and submit them for inclusion in NIO Java 7? http://openjdk.java.net/projects/nio/
On Thu, Jun 11, 2009 at 12:21 PM, Jason Rutherglen < jason.rutherg...@gmail.com> wrote: > Makes sense. > > Currently MMapDirectory doesn't write using mapped byte buffers, > would the memory management of the OS behave differently if we > were writing to the MMapped bytebuffers as opposed to writing to > an RAF (like with FSDir)? > > > it's blind LRU approach is often a poor policy (eg for terms > dict, where a binary search could easily suddenly need to visit > a random rarely accessed page). > > Agreed it's not the best for termDict. > > > Well... locality is still important. Under the hood, mmap on a > page miss must hit the disk. > > Maybe this is where MappedByteBuffer.load as Earwin has > mantioned comes in handy? > > But yeah, we can't do anything with this unless we had a JNI > library that interacts more directly with the IO system > (allowing us to configure whether IO is cached etc), which > perhaps exists or could exist in the future (or Java7?). > > > > On Thu, Jun 11, 2009 at 2:43 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> On Wed, Jun 10, 2009 at 9:24 PM, Jason >> Rutherglen<jason.rutherg...@gmail.com> wrote: >> > I read over the LUCENE-1458 comments again. Interesting. I think >> > the most compelling argument is that the various files we're >> > normally loading into the heap are, after merging, in the IO >> > cache. If we can simply reuse the IO cache rather then allocate >> > a bunch of redundant arrays in heap, we could be better off? I >> > think this is very compelling for field caches, delDocs, and >> > bitsets that are tied to segments and loaded after each merge. >> >> The OS doesn't have enough information to "know" what data structures >> are important to Lucene (must stay hot) and which are less so. It's >> blind LRU approach is often a poor policy (eg for terms dict, where a >> binary search could easily suddenly need to visit a random rarely >> accessed page). >> >> For example, after merging, all the segments we just *read* from will >> also be hot, having flushed out other important pages from the IO >> cache, which is very much not what we want to do. From C, and per-OS, >> you can inform the OS that it should not cache the bytes read from the >> file, but from Java we just can't control that. >> >> > I think it's possible to write some basic benchmarks to test a >> > byte[] BitVector vs.a MappedByteBuffer BitVector and see what >> > happens. >> >> Yes, but this is challenging to test properly. On systems with plenty >> of RAM, the approaches should be similarly fast. On systems starved >> for RAM, both approaches should thrash miserably. It's the cases in >> between that we need to test for. >> >> > The other potentially interesting angle here is in regards to >> > realtime updates, where we can implement a MMaped page type of >> > system so blocks of this stuff can be updated in near realtime, >> > directly in the MMaped space (similar to how in heap land with >> > LUCENE-1526 we're looking at breaking up the byte[] into a >> > byte[][]). >> >> But carrying such updates via RAM, like we do now for deletions, >> should generally be more performant (you never have to put the changes >> on disk). >> >> > Also if we assume data is MMaped I don't think it matters as much if >> > the updates on disk are not in sequence? (Whereas today we try >> > to keep all our files sequentially readable optimized). Of >> > course I could be completely wrong. :) >> >> Well... locality is still important. Under the hood, mmap on a page >> miss must hit the disk. >> >> Mike >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >