[ https://issues.apache.org/jira/browse/LUCENE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665697#action_12665697 ]
Marvin Humphrey commented on LUCENE-1476: ----------------------------------------- Mike McCandless: >> If we move the deletions filtering up, then we'd increase traffic through >> that cache > > OK, right. So we may have some added cost because of this. I think it's only > TermScorer that uses the bulk API though. The original BooleanScorer also pre-fetches. (That doesn't affect KS because ORScorer, ANDScorer, NOTScorer and RequiredOptionScorer (which have collectively replaced BooleanScorer) all proceed doc-at-a-time and implement skipping.) And I'm still not certain it's a good idea from an API standpoint: it's strange to have the PostingList and Scorer iterators included deleted docs. Nevertheless, I've now changed over KS to use the deletions-as-filter approach in svn trunk. The tie-breaker was related to the ongoing modularization of IndexReader: if PostingList doesn't have to handle deletions, then PostingsReader doesn't have to know about DeletionsReader, and if PostingsReader doesn't have to know about DeletionsReader, then all IndexReader sub-components can be implemented independently. The code to implement the deletion skipping turned out to be more verbose and fiddly than anticipated. It's easy to make fencepost errors when dealing with advancing two iterators in sync, especially when you can only invoke the skipping iterator method once for a given doc num. {code} void Scorer_collect(Scorer *self, HitCollector *collector, DelEnum *deletions, i32_t doc_base) { i32_t doc_num = 0; i32_t next_deletion = deletions ? 0 : I32_MAX; /* Execute scoring loop. */ while (1) { if (doc_num > next_deletion) { next_deletion = DelEnum_Advance(deletions, doc_num); if (next_deletion == 0) { next_deletion = I32_MAX; } continue; } else if (doc_num == next_deletion) { /* Skip past deletions. */ while (doc_num == next_deletion) { /* Artifically advance scorer. */ while (doc_num == next_deletion) { doc_num++; next_deletion = DelEnum_Advance(deletions, doc_num); if (next_deletion == 0) { next_deletion = I32_MAX; } } /* Verify that the artificial advance actually worked. */ doc_num = Scorer_Advance(self, doc_num); if (doc_num > next_deletion) { next_deletion = DelEnum_Advance(deletions, doc_num); } } } else { doc_num = Scorer_Advance(self, doc_num + 1); if (doc_num >= next_deletion) { next_deletion = DelEnum_Advance(deletions, doc_num); if (doc_num == next_deletion) { continue; } } } if (doc_num) { HC_Collect(collector, doc_num + doc_base, Scorer_Tally(self)); } else { break; } } } {code} > BitVector implement DocIdSet, IndexReader returns DocIdSet deleted docs > ----------------------------------------------------------------------- > > Key: LUCENE-1476 > URL: https://issues.apache.org/jira/browse/LUCENE-1476 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: Jason Rutherglen > Priority: Trivial > Attachments: LUCENE-1476.patch, LUCENE-1476.patch, > quasi_iterator_deletions.diff, quasi_iterator_deletions_r2.diff > > Original Estimate: 12h > Remaining Estimate: 12h > > Update BitVector to implement DocIdSet. Expose deleted docs DocIdSet from > IndexReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org