RE: Pooling in RAMFile

Shad Storhaug Tue, 12 Dec 2017 09:05:18 -0800

Pantazis,

Per the notes in RAMDirectory 
(https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/RAMDirectory.html):



Warning: This class is not intended to work with huge indexes. Everything 
beyond several hundred megabytes will waste resources (GC cycles), because it 
uses an internal buffer size of 1024 bytes, producing millions of byte[1024] 
arrays. This class is optimized for small memory-resident indexes. It also has 
bad concurrency on multithreaded environments.

It is recommended to materialize large indexes on disk and use 
MMapDirectory<https://lucene.apache.org/core/4_8_0/core/org/apache/lucene/store/MMapDirectory.html>,
 which is a high-performance directory implementation working directly on the 
file system cache of the operating system, so copying data to Java heap space 
is not useful.


In short, it sounds like you are attempting to use RAMDirectory for something 
it is not meant for – that is, large amounts of data. RAMDirectory has 
practical uses in scenarios where you are testing and do not want to persist to 
disk and certain production scenarios where an index is small enough to reside 
in RAM, but other than that you should usually persist it to disk.

Do note there is an alternative implementation named MemoryIndex in 
Lucene.Net.Memory 
(https://www.nuget.org/packages/Lucene.Net.Memory/4.8.0-beta00005), 
documentation here 
(https://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html),
 which generally has better performance than RAMDirectory, although it is 
limited to a single in-memory document.

As for why certain design decisions were made, I suggest you direct your 
question to the Lucene mailing lists 
(https://lucene.apache.org/core/discussion.html). All we can tell you here is 
that (with some exceptions to make the API more .NET-like) that we have 
faithfully ported the design the way it was in Lucene 4.8.0, but nobody here 
was involved in the design decisions. Do note there are also some helpful books 
about Lucene available on Amazon.com that go into some detail about many of the 
components and how to make use of them.

Thanks,
Shad Storhaug (NightOwl888)
Lucene.NET PMC Member

From: Pantazis Deligiannis [mailto:[email protected]]
Sent: Tuesday, December 12, 2017 5:30 PM
To: [email protected]
Subject: Pooling in RAMFile

Hello,
I am quite new user of Lucene, and I was going through the source code trying 
to understand some parts of the implementation.

I was wondering if it would be possible to use pooling inside RAMFile for the 
byte arrays that get allocated via the NewBuffer method (especially since the 
BUFFER_SIZE seems to be fixed as 1024 in RAMOutputStream), and if not what is 
the exact reason? Is it because of thread safety, since lots of 
(publicly-facing) APIs are accessing RAMFile (and potentially allocating new 
buffers) and these could be called by arbitrary threads, which would require 
synchronization which would be really expensive?

By the way, I understand that the NewBuffer is virtual, so a subclass who is 
overriding this can allocate buffers from a custom solution (i.e. pooling), but 
I am mostly wondering what is the reasoning for the base implementation 
provided by Lucene.

Many thanks,
Pantazis

RE: Pooling in RAMFile

Reply via email to