[
https://issues.apache.org/jira/browse/LUCENE-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972932#comment-15972932
]
Steve Mason commented on LUCENE-7778:
-------------------------------------
Yes we're using small temporary indexes to do on-the-fly matching of documents
to queries (not vice-versa which is the regular Lucene use-case).
Luwak actually uses a MemoryIndex internally if you give it a batch of one
document. If you have a number of documents to search it switches to a
RAMDirectory. With the batch sizes we're using Luwak / RAMDirectory performance
surpasses MemoryIndex by a factor of 2-3.
MemoryIndex performance appears to be constant - one query checked against one
page will always take roughly X milliseconds. If you have Y queries and Z pages
then the time taken to search will be roughly X * Y * Z milliseconds. With
RAMDirectory the performance doesn't scale linearly given the number of
documents in the index like this. It might take 10 milliseconds to search an
index of 10 documents, but it'll take 20 milliseconds to search an index of 100
documents (I'm making up the numbers here, but that's the effect we're
observing).
When using MMapDirectory it seems that the cost of I/O starts to come into play
- with both RAMDirectory and MemoryIndex the time to index the document doesn't
show up as being a factor really (this contention was observed when _reading_
from a RAMDirectory). With MMap it's a big factor (there also seems to be some
more contention / blocking / thread joining going on, which looks pretty hairy)
> Remove synchronized from high-contention methods on RAMFile
> -----------------------------------------------------------
>
> Key: LUCENE-7778
> URL: https://issues.apache.org/jira/browse/LUCENE-7778
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/store
> Reporter: Steve Mason
> Priority: Minor
>
> When benchmarking RAMDirectory access via multiple threads the methods
> {{RAMFile::numBuffers}} and {{RAMFile::getBuffer}} show up blocking threads
> fairly frequently
> By removing the {{synchronized}} keyword from these methods our internal
> benchmarks show a 2x performance increase under concurrent load.
> I don't think removing {{synchronized}} from these methods is a problem as
> they are read-only and write access to these fields is not synchronized.
> LUCENE-2779 also implies that some ofthe locking on RAMDirectory is not
> necessary
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]