[ 
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Feak updated LUCENE-1316:
------------------------------


I wanted to share my micro load test results with you, to make sure you all 
understand scale of the bottleneck as we are experiencing it.

For an optimized index with 4700+ documents (ie small), a NOT query varies by a 
factor of 35 under heavy load. Using 2.3.0 release I got 20 tps. With the 
volatile/synchronized fix suggested, I got 700 tps. The limiting factor on the 
700 tps was the CPU on the computer throwing load, so this may be even worse. 
Furthermore, the more documents that exist in the index, the worse this may 
get, as it synchonizes on every single iteration through the loop.

An argument can be made that this is just a necessary evil, and that we *must* 
synchronize on this for the possibility of updates during reads. I have 2 
questions regarding that.

1. What is the cost of a dirty read in that case? Is it stale data, incorrect 
data, or a corrupted system?
2. What is more prevalent in a production system. Indexes with no deletes, 
indexes with *some* deletes, or indexes with frequent deletes?

Do we need to have 1 class that does it all, or should we consider 2 different 
implementation for 2 different uses. What about a read-only SegmentReader for 
read-only slaves in production environments?




> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as 
> a potential synchronization bottleneck. However, the reason this  bottleneck 
> occurs is actually at a higher level that wasn't focused on (at least in the 
> threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, 
> higher in the stack you see the MatchAllScorer.next() method. In Solr 
> paricularly, this scorer is used for "NOT" queries. We saw incredibly poor 
> performance (order of magnitude) on our load tests for NOT queries, due to 
> this bottleneck. The problem is that every single document is run through 
> this isDeleted() method, which is synchronized. Having an optimized index 
> exacerbates this issues, as there is only a single SegmentReader to 
> synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in 
> the reader, much of this can be avoided. Especially in a read-only 
> environment for production where you have slaves doing all the high load 
> searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance 
> improvement.  We also got the same query results. I don't believe this will 
> improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to