[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

John Wang (JIRA) Tue, 10 Nov 2009 08:29:58 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775930#action_12775930
 ]


John Wang commented on LUCENE-1526:
-----------------------------------

bq.  we need to see it in the real-world context of running actual worst case 
queries.

Isn't checking every document in the corpus for deletes the worse case? e.g. 
first test?

bq. at the expense of slower query time

According to the test, Zoie's query time is faster.

bq. it must double-check the deletions.

True, this double-check is only done for a candidate for a hit from the 
underlying query. Normally result set is much smaller than the corpus, the 
overhead is not large. The overhead is 1 array lookup + a delset look up vs. 1 
bitvector lookup. 

bq. Can you describe the setup of the "indexing only "test?

starting off with an empty index and keep on adding documents, at the same 
time, for each search request, return a reader for the current state of the 
indexing. Our test assumes 10 concurrent threads making search calls.


> For near real-time search, use paged copy-on-write BitVector impl
> -----------------------------------------------------------------
>
>                 Key: LUCENE-1526
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1526
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>         Attachments: LUCENE-1526.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> SegmentReader currently uses a BitVector to represent deleted docs.
> When performing rapid clone (see LUCENE-1314) and delete operations,
> performing a copy on write of the BitVector can become costly because
> the entire underlying byte array must be created and copied. A way to
> make this clone delete process faster is to implement tombstones, a
> term coined by Marvin Humphrey. Tombstones represent new deletions
> plus the incremental deletions from previously reopened readers in
> the current reader. 
> The proposed implementation of tombstones is to accumulate deletions
> into an int array represented as a DocIdSet. With LUCENE-1476,
> SegmentTermDocs iterates over deleted docs using a DocIdSet rather
> than accessing the BitVector by calling get. This allows a BitVector
> and a set of tombstones to by ANDed together as the current reader's
> delete docs. 
> A tombstone merge policy needs to be defined to determine when to
> merge tombstone DocIdSets into a new deleted docs BitVector as too
> many tombstones would eventually be detrimental to performance. A
> probable implementation will merge tombstones based on the number of
> tombstones and the total number of documents in the tombstones. The
> merge policy may be set in the clone/reopen methods or on the
> IndexReader. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1526) For near real-time search, use paged copy-on-write BitVector impl

Reply via email to