[ https://issues.apache.org/jira/browse/LUCY-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879468#action_12879468 ]
Marvin Humphrey commented on LUCY-111: -------------------------------------- There are two interface issues in this Matcher implementation which we should revisit some time in the future. We should not attempt to resolve them prior to initial release because they will require extended benchmarking and experimentation. First, this implementation of Matcher uses 0 as a sentinel rather than Integer.MAX_VALUE a la Lucene. Lucy uses 0 to represent "invalid doc id", and Next() and Advance() return doc ids; we can treat doc id 0 as "false" to indicate that the Matcher is exhausted -- an intuitive iterator interface: {code:none} while (my $doc_id = $matcher->next) { ... } {code} Integer.MAX_VALUE was chosen for Lucene which to optimize certain constructs; furthermore, 0 is a valid doc ID in Lucene, so using it as a sentinel isn't an option. For the extended discussion, see LUCENE-1614. Second, this implementation's Collect() method uses seperately iterated deletions. The advantage of this strategem for now is that none of our low-level Matchers have to worry about deletions. However, iterated deletions did not perform as well as random-access deletions in some benchmarks run by Mike McCandless for Lucene in LUCENE-1476, and it makes the Matcher's iterator somewhat more awkward to use directly if you want to avoid deletions. There may be more opportunities for optimization a la LUCENE-1536, as well. > Matcher > ------- > > Key: LUCY-111 > URL: https://issues.apache.org/jira/browse/LUCY-111 > Project: Lucy > Issue Type: New Feature > Components: Core - Search > Reporter: Marvin Humphrey > Assignee: Marvin Humphrey > Priority: Blocker > Attachments: Matcher.bp, Matcher.c > > > A Matcher is an object which matches a set of Lucy doc ids, iterating over > them via Next() and Advance(). It combines the roles of Lucene's > DocIdSetIterator and Scorer classes. > Some -- but not all -- Matchers implement a Score() method. We can refer to > such Matchers informally as "scorers", but Lucy won't need a Scorer class a la > Lucene. In Lucy, Query classes will compile down to Matchers that either > Score() or don't. This allows us to perform optimizations on branches of > compound scorers: compiling "foo AND NOT bar" will produce a scoring Matcher > for "foo" and a non-scoring Matcher for "bar", since the "bar" branch can > never contribute to the score. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.