Is there something that I am missing? I see lots of references to
using "memory mapped" files to "dramatically" improve performance.
I don't think this is the case at all. At the lowest levels, it is
somewhat more efficient from a CPU standpoint, but with a decent OS
cache the IO performance difference is going to negligible.
The primary benefit of memory mapped files is simplicity in code
(although in Java there is another layer needed - think C ), and the
file can be treated as a random accessible memory array.
From my OS design experience, the page at http://en.wikipedia.org/
wiki/Memory-mapped_file is incorrect.
Even if the memory mapped file is mapped into the virtual memory
space, unless you specialized memory controllers and disk systems,
when a page fault occurs, the OS loads the page just as any other.
The difference with direct IO, is that there is first a simple
translation from position to disk page, and the OS disk page cache is
checked. Almost exactly the same thing occurs with a memory mapped file.
The memory addressed is accessed, if not in memory, a page fault
occurs, and the page is loaded from the file (it may be loaded from
the OS disk cache in this process).
The point being, if the page is not in the cache (which is probably
the case with a large index), the time to load the page is far
greater than the difference between the IO address translation and
the memory address lookup.
If all of the pages of the index can fit in memory, a properly
configured system is going to have them in the page cache anyway....
On Dec 23, 2008, at 8:22 PM, Marvin Humphrey wrote:
On Tue, Dec 23, 2008 at 05:51:43PM -0800, Jason Rutherglen wrote:
Are there other implementation options?
Here's the plan for Lucy/KS:
1) Design index formats that can be memory mapped rather than
slurped,
bringing the cost of opening/reopening an IndexReader down to a
negligible level.
2) Enable segment-centric sorted search. (LUCENE-1483)
3) Implement tombstone-based deletions, so that the cost of deleting
documents scales with the number of deletions rather than the
size of the
index.
4) Allow 2 concurrent writers: one for small, fast updates, and
one for
big background merges.
Marvin Humphrey
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org