My current open source project is a Directory that is just like
RAMDirectory, but everything is memory-mapped. The idea is it creates a
disk file, opens it, and immediately deletes the file. The file still
exists until the IndexReader/Writer/Searcher closes it. But, it cannot
be found from the file system. This is just like a RAMDirectory, but
without memory limitations.
It's proving to be harder than it looked.
The application is to store encrypted indexes in memory, with the
decrypted contents in this non-findable format. I'm in medical document
analysis now, and we can't store anything on disk in the clear.
Lance
On 07/01/2013 07:07 AM, Emmanuel Espina wrote:
Hi Erick! Nice to hear from you again! From time to time my interest
in these "Lucene things" returns and I do some experiments :p
Just to add to this conversation, I found an interesting link to
Mike's blog about memory resident indexes (using another virtual
machine)
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
and also (which is not exactly what I asked but seems related) there
is a Google Summer of Code project to build a memory residen term
resident:
http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001
Thanks
Emmanuel
2013/7/1 Erick Erickson <erickerick...@gmail.com>:
Hey Emma! It's been a while....
Building on what Steven said, here's Uwe's blog on
MMapDirectory and Lucene:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
I've always considered RAMDirectory for rather restricted
use-cases. I.e. if I know without doubt that the index
is both relatively static and bounded. The other use I've
seen is to use it to index single documents on-the-fly for
some reason (say complex processing of a single result)
then throw it out afterwards.
How are things going?
Erick
On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <ste...@likeness.com>wrote:
On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemman...@gmail.com>
wrote:
I'm building a distributed index (mostly as a reasearch project for
school) and I'm evaluating indexing the entire collection in memory
(like google, facebook and others have done years ago). The obvious
reason for this is performance considering that the replication will
give me a reasonably good durability of the data (despite being in
volatile memory).
What is the current status of Lucene for this kind of indexes?
RAMDirectory in it's documentation has a scary warning that says that
"is not intended to work with huge indexes", and that sounds more like
it is an implementation for testing rather than something for
production.
Of course there is no real context for this question, because it is a
reasearch topic. Testing it's limits would be the closest to a context
I have :p
You could consider MMapDirectory, which will end up putting the active
portions
of the index in memory (via the filesystem buffer cache).
The benefit is that you don't completely destroy the Java heap
(RAMDirectory causes immense
GC pressure if you are not careful) and you don't have to commit all of
your ram to index usage all the time.
The downside is that if your working set exceeds the amount of RAM
available for buffer cache, you will get silent performance degradation as
you fall back to disk reads for the missing blocks.
Maybe this is OK for your use case, maybe not.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org