Re: In memory index (current status in Lucene)

Lance Norskog Mon, 01 Jul 2013 14:42:07 -0700

My current open source project is a Directory that is just likeRAMDirectory, but everything is memory-mapped. The idea is it creates adisk file, opens it, and immediately deletes the file. The file stillexists until the IndexReader/Writer/Searcher closes it. But, it cannotbe found from the file system. This is just like a RAMDirectory, butwithout memory limitations.


It's proving to be harder than it looked.

The application is to store encrypted indexes in memory, with thedecrypted contents in this non-findable format. I'm in medical documentanalysis now, and we can't store anything on disk in the clear.


Lance

On 07/01/2013 07:07 AM, Emmanuel Espina wrote:

Hi Erick! Nice to hear from you again! From time to time my interest
in these "Lucene things" returns and I do some experiments :p

Just to add to this conversation, I found an interesting link to
Mike's blog about memory resident indexes (using another virtual
machine) 
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html
and also (which is not exactly what I asked but seems related) there
is a Google Summer of Code project to build a memory residen term
resident: 
http://www.google-melange.com/gsoc/project/google/gsoc2013/billybob/42001

Thanks
Emmanuel


2013/7/1 Erick Erickson <erickerick...@gmail.com>:

Hey Emma! It's been a while....

Building on what Steven said, here's Uwe's blog on
MMapDirectory and Lucene:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I've always considered RAMDirectory for rather restricted
use-cases. I.e. if I know without doubt that the index
is both relatively static and bounded. The other use I've
seen is to use it to index single documents on-the-fly for
some reason (say complex processing of a single result)
then throw it out afterwards.

How are things going?

Erick



On Fri, Jun 28, 2013 at 5:36 PM, Steven Schlansker <ste...@likeness.com>wrote:

On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemman...@gmail.com>
wrote:

I'm building a distributed index (mostly as a reasearch project for
school) and I'm evaluating indexing the entire collection in memory
(like google, facebook and others have done years ago). The obvious
reason for this is performance considering that the replication will
give me a reasonably good durability of the data (despite being in
volatile memory).

What is the current status of Lucene for this kind of indexes?
RAMDirectory in it's documentation has a scary warning that says that
"is not intended to work with huge indexes", and that sounds more like
it is an implementation for testing rather than something for
production.

Of course there is no real context for this question, because it is a
reasearch topic. Testing it's limits would be the closest to a context
I have :p

You could consider MMapDirectory, which will end up putting the active
portions
of the index in memory (via the filesystem buffer cache).

The benefit is that you don't completely destroy the Java heap
(RAMDirectory causes immense
GC pressure if you are not careful) and you don't have to commit all of
your ram to index usage all the time.

The downside is that if your working set exceeds the amount of RAM
available for buffer cache, you will get silent performance degradation as
you fall back to disk reads for the missing blocks.

Maybe this is OK for your use case, maybe not.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: In memory index (current status in Lucene)

Reply via email to