[ https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574743#comment-16574743 ]
Dawid Weiss commented on LUCENE-8438: ------------------------------------- I'll take the above plan as a lazy consensus yes? :) I'll create sub-tasks for the above and start with adding deprecations first, then hold until 8.x branch is cut before I integrate the rest of this stuff. > RAMDirectory speed improvements and cleanup > ------------------------------------------- > > Key: LUCENE-8438 > URL: https://issues.apache.org/jira/browse/LUCENE-8438 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Minor > Attachments: capture-1.png, capture-4.png > > Time Spent: 1h 10m > Remaining Estimate: 0h > > RAMDirectory screams for a cleanup. It is used and abused in many places and > even if we discourage its use in favor of native (mmapped) buffers, there > seem to be benefits of keeping RAMDirectory available (quick throw-away > indexes without the need to setup external tmpfs, for example). > Currently RAMDirectory performs very poorly under concurrent loads. The > implementation is also open for all sorts of abuses – the streams can be > reset and are used all around the place as temporary buffers, even without > the presence of RAMDirectory itself. This complicates the implementation and > is pretty confusing. > An example of how dramatically slow RAMDirectory is under concurrent load, > consider this PoC pseudo-benchmark. It creates a single monolithic segment > with 500K very short documents (single field, with norms). The index is ~60MB > once created. We then run semi-complex Boolean queries on top of that index > from N concurrent threads. The attached capture-4 shows the result (queries > per second over 5-second spans) for a varying number of concurrent threads on > an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, > 16 hyper-threaded). That red line at the bottom (which drops compared to a > single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an > alternative implementation I wrote that uses ByteBuffers. Yes, it's slower > than the native mmapped implementation, but a *lot* faster then the current > RAMDirectory (and more GC-friendly because it uses dynamic progressive block > scaling internally). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org