[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Marvin Humphrey (JIRA) Tue, 20 Jan 2009 20:24:22 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665697#action_12665697
 ]


Marvin Humphrey commented on LUCENE-1476:
-----------------------------------------

Mike McCandless:

>> If we move the deletions filtering up, then we'd increase traffic through
>> that cache
>
> OK, right. So we may have some added cost because of this. I think it's only
> TermScorer that uses the bulk API though.

The original BooleanScorer also pre-fetches.  (That doesn't affect KS because
ORScorer, ANDScorer, NOTScorer and RequiredOptionScorer (which have
collectively replaced BooleanScorer) all proceed doc-at-a-time and implement
skipping.)

And I'm still not certain it's a good idea from an API standpoint: it's
strange to have the PostingList and Scorer iterators included deleted docs.

Nevertheless, I've now changed over KS to use the deletions-as-filter approach
in svn trunk. The tie-breaker was related to the ongoing modularization of
IndexReader: if PostingList doesn't have to handle deletions, then
PostingsReader doesn't have to know about DeletionsReader, and if
PostingsReader doesn't have to know about DeletionsReader, then all
IndexReader sub-components can be implemented independently.

The code to implement the deletion skipping turned out to be more verbose and
fiddly than anticipated.  It's easy to make fencepost errors when dealing with
advancing two iterators in sync, especially when you can only invoke the
skipping iterator method once for a given doc num.

{code}
void
Scorer_collect(Scorer *self, HitCollector *collector, DelEnum *deletions,
               i32_t doc_base)
{
    i32_t   doc_num        = 0;
    i32_t   next_deletion  = deletions ? 0 : I32_MAX;

    /* Execute scoring loop. */
    while (1) {
        if (doc_num > next_deletion) {
            next_deletion = DelEnum_Advance(deletions, doc_num);
            if (next_deletion == 0) { next_deletion = I32_MAX; }
            continue;
        }
        else if (doc_num == next_deletion) {
            /* Skip past deletions. */
            while (doc_num == next_deletion) {
                /* Artifically advance scorer. */
                while (doc_num == next_deletion) {
                    doc_num++;
                    next_deletion = DelEnum_Advance(deletions, doc_num);
                    if (next_deletion == 0) { next_deletion = I32_MAX; }
                }
                /* Verify that the artificial advance actually worked. */
                doc_num = Scorer_Advance(self, doc_num);
                if (doc_num > next_deletion) {
                    next_deletion = DelEnum_Advance(deletions, doc_num);
                }
            }
        }
        else {
            doc_num = Scorer_Advance(self, doc_num + 1);
            if (doc_num >= next_deletion) { 
                next_deletion = DelEnum_Advance(deletions, doc_num);
                if (doc_num == next_deletion) { continue; }
            }
        }

        if (doc_num) {
            HC_Collect(collector, doc_num + doc_base, Scorer_Tally(self));
        }
        else { 
            break; 
        }
    }
}
{code}



> BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-1476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1476
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Trivial
>         Attachments: LUCENE-1476.patch, LUCENE-1476.patch, 
> quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Update BitVector to implement DocIdSet.  Expose deleted docs DocIdSet from 
> IndexReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1476) BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs

Reply via email to