Dawid Weiss created LUCENE-8438:
-----------------------------------

             Summary: RAMDirectory speed improvements and cleanup
                 Key: LUCENE-8438
                 URL: https://issues.apache.org/jira/browse/LUCENE-8438
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Dawid Weiss
            Assignee: Dawid Weiss
         Attachments: capture-4.png

RAMDirectory screams for a cleanup. It is used and abused in many places and 
even if we discourage its use in favor of native (mmapped) buffers, there seem 
to be benefits of keeping RAMDirectory available (quick throw-away indexes 
without the need to setup external tmpfs, for example).

Currently RAMDirectory performs very poorly under concurrent loads. The 
implementation is also open for all sorts of abuses – the streams can be reset 
and are used all around the place as temporary buffers, even without the 
presence of RAMDirectory itself. This complicates the implementation and is 
pretty confusing.

An example of how dramatically slow RAMDirectory is under concurrent load, 
consider this PoC pseudo-benchmark. It creates a single monolithic segment with 
500K very short documents (single field, with norms). The index is ~60MB once 
created. We then run semi-complex Boolean queries on top of that index from N 
concurrent threads. The attached capture-4 shows the result (queries per second 
over 5-second spans) for a varying number of concurrent threads on an AWS 
machine with 32 CPUs available (of which it seems 16 seem to be real, 16 
hyper-threaded). That red line at the bottom (which drops compared to a 
single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
alternative implementation I wrote that uses ByteBuffers. Yes, it's slower than 
the native mmapped implementation, but a *lot* faster then the current 
RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
scaling internally).

I'll clean it all up and prepare a patch this week. The PoC code discussed 
above is at [1] but I wouldn't spend any time on this yet.

[1] https://github.com/dweiss/ramdir2




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to