[
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571426#comment-16571426
]
Dawid Weiss edited comment on LUCENE-8438 at 8/7/18 10:33 AM:
--------------------------------------------------------------
This shows the QPS performance on an AWS 36-core (18 physical cores) with
increasing thread count and various directory implementations – BBDIR is
ByteBuffersDirectory, FSDir is Lucene's native FSDirectory, RAMDIR is current
RAMDirectory. The variations of BBDIR relate to which IndexInput is returned:
MANY_BUFS is multiple ByteBuffers (exactly the buffers written to IndexOutput),
ONE_BUF is the same implementation, buf buffers rewritten into a single
ByteBuffer (results in contiguous access and fewer block-boundary hits),
BYTE_ARRAY is rewritten into a contiguous array and wrapped in
ByteArrayIndexInput, LUCENE_BUFS is the original ByteBuffers wrapped in
Lucene's ByteBuffer handling code.
My opinion is to leave LUCENE_BUFS as the default since it exhibits high
performance, doesn't require contiguous memory allocation, etc.
!capture-1.png|width=600!
was (Author: dweiss):
This shows the QPS performance on an AWS 36-core (18 physical cores) with
increasing thread count and various directory implementations – BBDIR is
ByteBuffersDirectory, FSDir is Lucene's native FSDirectory, RAMDIR is current
RAMDirectory. The variations of BBDIR relate to which IndexInput is returned:
MANY_BUFS is multiple ByteBuffers (exactly the buffers written to IndexOutput),
ONE_BUF is the same implementation, buf buffers rewritten into a single
ByteBuffer (results in contiguous access and fewer block-boundary hits),
BYTE_ARRAY is rewritten into a contiguous array and wrapped in
ByteArrayIndexInput, LUCENE_BUFS is the original ByteBuffers wrapped in
Lucene's ByteBuffer handling code.
My opinion is to leave LUCENE_BUFS as the default since it exhibits high
performance, doesn't require contiguous memory allocation, etc.
!capture-1.png|width=90!
> RAMDirectory speed improvements and cleanup
> -------------------------------------------
>
> Key: LUCENE-8438
> URL: https://issues.apache.org/jira/browse/LUCENE-8438
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Attachments: capture-1.png, capture-4.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and
> even if we discourage its use in favor of native (mmapped) buffers, there
> seem to be benefits of keeping RAMDirectory available (quick throw-away
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The
> implementation is also open for all sorts of abuses – the streams can be
> reset and are used all around the place as temporary buffers, even without
> the presence of RAMDirectory itself. This complicates the implementation and
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load,
> consider this PoC pseudo-benchmark. It creates a single monolithic segment
> with 500K very short documents (single field, with norms). The index is ~60MB
> once created. We then run semi-complex Boolean queries on top of that index
> from N concurrent threads. The attached capture-4 shows the result (queries
> per second over 5-second spans) for a varying number of concurrent threads on
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real,
> 16 hyper-threaded). That red line at the bottom (which drops compared to a
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower
> than the native mmapped implementation, but a *lot* faster then the current
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block
> scaling internally).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]