On Jun 28, 2013, at 2:29 PM, Emmanuel Espina <espinaemman...@gmail.com> wrote:
> I'm building a distributed index (mostly as a reasearch project for > school) and I'm evaluating indexing the entire collection in memory > (like google, facebook and others have done years ago). The obvious > reason for this is performance considering that the replication will > give me a reasonably good durability of the data (despite being in > volatile memory). > > What is the current status of Lucene for this kind of indexes? > RAMDirectory in it's documentation has a scary warning that says that > "is not intended to work with huge indexes", and that sounds more like > it is an implementation for testing rather than something for > production. > > Of course there is no real context for this question, because it is a > reasearch topic. Testing it's limits would be the closest to a context > I have :p You could consider MMapDirectory, which will end up putting the active portions of the index in memory (via the filesystem buffer cache). The benefit is that you don't completely destroy the Java heap (RAMDirectory causes immense GC pressure if you are not careful) and you don't have to commit all of your ram to index usage all the time. The downside is that if your working set exceeds the amount of RAM available for buffer cache, you will get silent performance degradation as you fall back to disk reads for the missing blocks. Maybe this is OK for your use case, maybe not. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org