On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemman...@gmail.com> wrote:

> I'm building a distributed index (mostly as a reasearch project for
> school) and I'm evaluating indexing the entire collection in memory
> (like google, facebook and others have done years ago). The obvious
> reason for this is performance considering that the replication will
> give me a reasonably good durability of the data (despite being in
> volatile memory).
> 
> What is the current status of Lucene for this kind of indexes?
> RAMDirectory in it's documentation has a scary warning that says that
> "is not intended to work with huge indexes", and that sounds more like
> it is an implementation for testing rather than something for
> production.
> 
> Of course there is no real context for this question, because it is a
> reasearch topic. Testing it's limits would be the closest to a context
> I have :p

You could consider MMapDirectory, which will end up putting the active portions
of the index in memory (via the filesystem buffer cache).

The benefit is that you don't completely destroy the Java heap (RAMDirectory 
causes immense
GC pressure if you are not careful) and you don't have to commit all of your 
ram to index usage all the time.

The downside is that if your working set exceeds the amount of RAM available 
for buffer cache, you will get silent performance degradation as you fall back 
to disk reads for the missing blocks.

Maybe this is OK for your use case, maybe not.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to