Pantazis, Per the notes in RAMDirectory (https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/RAMDirectory.html):
Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of byte[1024] arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use MMapDirectory<https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/MMapDirectory.html>, which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to Java heap space is not useful. In short, it sounds like you are attempting to use RAMDirectory for something it is not meant for – that is, large amounts of data. RAMDirectory has practical uses in scenarios where you are testing and do not want to persist to disk and certain production scenarios where an index is small enough to reside in RAM, but other than that you should usually persist it to disk. Do note there is an alternative implementation named MemoryIndex in Lucene.Net.Memory (https://www.nuget.org/packages/Lucene.Net.Memory/4.8.0-beta00005), documentation here (https://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html), which generally has better performance than RAMDirectory, although it is limited to a single in-memory document. As for why certain design decisions were made, I suggest you direct your question to the Lucene mailing lists (https://lucene.apache.org/core/discussion.html). All we can tell you here is that (with some exceptions to make the API more .NET-like) that we have faithfully ported the design the way it was in Lucene 4.8.0, but nobody here was involved in the design decisions. Do note there are also some helpful books about Lucene available on Amazon.com that go into some detail about many of the components and how to make use of them. Thanks, Shad Storhaug (NightOwl888) Lucene.NET PMC Member From: Pantazis Deligiannis [mailto:[email protected]] Sent: Tuesday, December 12, 2017 5:30 PM To: [email protected] Subject: Pooling in RAMFile Hello, I am quite new user of Lucene, and I was going through the source code trying to understand some parts of the implementation. I was wondering if it would be possible to use pooling inside RAMFile for the byte arrays that get allocated via the NewBuffer method (especially since the BUFFER_SIZE seems to be fixed as 1024 in RAMOutputStream), and if not what is the exact reason? Is it because of thread safety, since lots of (publicly-facing) APIs are accessing RAMFile (and potentially allocating new buffers) and these could be called by arbitrary threads, which would require synchronization which would be really expensive? By the way, I understand that the NewBuffer is virtual, so a subclass who is overriding this can allocate buffers from a custom solution (i.e. pooling), but I am mostly wondering what is the reasoning for the base implementation provided by Lucene. Many thanks, Pantazis
