Re: [jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

robert engels Thu, 26 Jun 2008 08:42:54 -0700

I think a better approach might be a specialized classConcurrentBitSet designed for read lots, writes little (or just makea query not check deletes once it is started).


The ConcurrentHashMap in the JDK is a basis for the implementation.

Then, a SegmentDeletes that extends it (with IO functions) wouldcomplete the story.


On Jun 26, 2008, at 10:25 AM, Todd Feak (JIRA) wrote:

[ https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Feak updated LUCENE-1316:
------------------------------
I wanted to share my micro load test results with you, to make sureyou all understand scale of the bottleneck as we are experiencing it.
For an optimized index with 4700+ documents (ie small), a NOT queryvaries by a factor of 35 under heavy load. Using 2.3.0 release Igot 20 tps. With the volatile/synchronized fix suggested, I got 700tps. The limiting factor on the 700 tps was the CPU on the computerthrowing load, so this may be even worse. Furthermore, the moredocuments that exist in the index, the worse this may get, as itsynchonizes on every single iteration through the loop.
An argument can be made that this is just a necessary evil, andthat we *must* synchronize on this for the possibility of updatesduring reads. I have 2 questions regarding that.
1. What is the cost of a dirty read in that case? Is it stale data,incorrect data, or a corrupted system?2. What is more prevalent in a production system. Indexes with nodeletes, indexes with *some* deletes, or indexes with frequentdeletes?
Do we need to have 1 class that does it all, or should we consider2 different implementation for 2 different uses. What about a read-only SegmentReader for read-only slaves in production environments?
Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer------------------------------------------------------------------------
                Key: LUCENE-1316
URL: https://issues.apache.org/jira/browse/LUCENE-1316
            Project: Lucene - Java
         Issue Type: Bug
         Components: Query/Scoring
   Affects Versions: 2.3
        Environment: All
           Reporter: Todd Feak
           Priority: Minor
        Attachments: MatchAllDocsQuery.java

  Original Estimate: 1h
 Remaining Estimate: 1h
The isDeleted() method on IndexReader has been mentioned a numberof times as a potential synchronization bottleneck. However, thereason this bottleneck occurs is actually at a higher level thatwasn't focused on (at least in the threads I read).In every case I saw where a stack trace was provided to show thelock/block, higher in the stack you see the MatchAllScorer.next()method. In Solr paricularly, this scorer is used for "NOT"queries. We saw incredibly poor performance (order of magnitude)on our load tests for NOT queries, due to this bottleneck. Theproblem is that every single document is run through this isDeleted() method, which is synchronized. Having an optimized indexexacerbates this issues, as there is only a single SegmentReaderto synchronize on, causing a major thread pileup waiting for thelock.By simply having the MatchAllScorer see if there have been anydeletions in the reader, much of this can be avoided. Especiallyin a read-only environment for production where you have slavesdoing all the high load searching.
I modified line 67 in the MatchAllDocsQuery
FROM:
  if (!reader.isDeleted(id)) {
TO:
  if (!reader.hasDeletions() || !reader.isDeleted(id)) {
In our micro load test for NOT queries only, this was a majorperformance improvement. We also got the same query results. Idon't believe this will improve the situation for indexes thathave deletions.
Please consider making this adjustment for a future bug fix release.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Updated: (LUCENE-1316) Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer

Reply via email to