[ 
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-1316:
---------------------------------

    Attachment: LUCENE_1316.patch

Attaching prototype patch (needs javadoc + tests if approach is acceptable) 
that avoids all synchronization when iterating over all documents.

If IndexReader.termDocs(Term term) is called with null, a TermDocs 
implementation is returned that matches all documents.  This is the same 
approach used by TermScorer via SegmentTermDocs to avoid synchronization by 
grabbing the BitVector of deleted docs at instantiation.

This patch also updates MatchAllDocuments to use this TermDocs to iterate over 
all documents.

Advantages:
  - adds no new methods or interfaces, simply adds extra semantics to an 
existing method
  - works from the bottom-up... no need to instantiate a big BitVector
  - exposes the functionality to expert users for use in custom queries
  - avoids a binary search to find the correct IndexReader in a MultiReader for 
each call (it leverages all the TermDocs code in all IndexReader 
implementations such as MultiTermDocs).

Disadvantages:
  - TermDocs instance returned cannot be used to seek to a different term.  
However, this is minor and not a back compatibility concern since "null" was 
not previously a supported value.

On balance, I think it's 10% hack, 90% useful.  Thoughts?

> Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-1316
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1316
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Query/Scoring
>    Affects Versions: 2.3
>         Environment: All
>            Reporter: Todd Feak
>            Priority: Minor
>         Attachments: LUCENE_1316.patch, MatchAllDocsQuery.java
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The isDeleted() method on IndexReader has been mentioned a number of times as 
> a potential synchronization bottleneck. However, the reason this  bottleneck 
> occurs is actually at a higher level that wasn't focused on (at least in the 
> threads I read).
> In every case I saw where a stack trace was provided to show the lock/block, 
> higher in the stack you see the MatchAllScorer.next() method. In Solr 
> paricularly, this scorer is used for "NOT" queries. We saw incredibly poor 
> performance (order of magnitude) on our load tests for NOT queries, due to 
> this bottleneck. The problem is that every single document is run through 
> this isDeleted() method, which is synchronized. Having an optimized index 
> exacerbates this issues, as there is only a single SegmentReader to 
> synchronize on, causing a major thread pileup waiting for the lock.
> By simply having the MatchAllScorer see if there have been any deletions in 
> the reader, much of this can be avoided. Especially in a read-only 
> environment for production where you have slaves doing all the high load 
> searching.
> I modified line 67 in the MatchAllDocsQuery
> FROM:
>   if (!reader.isDeleted(id)) {
> TO:
>   if (!reader.hasDeletions() || !reader.isDeleted(id)) {
> In our micro load test for NOT queries only, this was a major performance 
> improvement.  We also got the same query results. I don't believe this will 
> improve the situation for indexes that have deletions. 
> Please consider making this adjustment for a future bug fix release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to