[ https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700577#action_12700577 ]
Michael McCandless commented on LUCENE-1518: -------------------------------------------- bq. I really hate the hasRandomAccess() approach from an OO design standpoint, but I have to admit that I don't have anything better. I think we need to approach this as a structural optimization problem; I think there should be an "optimize()" step after (or during) rewrite(). The optimize phase would structure the matching to run as quickly as possible. Ie, on detecting somehow that a filter is random access, we should multiply it out (down) to each TermScorer. If deletes are pre-multiplied in the filter, we tell each TermScorer not to check deletes. [We may need a filter manager to go along w/ this, eg that will convert a filter to random-access (if it's going to be reused), multiply in deletes (and re-do that whenever new reader is opened), etc.] Likewise, LUCENE-1252 splits matching of queries that consult positional information into two steps (roughly "cheap" and "expensive") and does all "cheap" tests across each "and" clause and "expensive" only when necessary. So optimize() would return two matchers for such queries, and we'd "collate" the cheap matchers together first, followed by the expensive one. Not requiring an implicit next() after skipTo() ... so optimize would decide which matchers should "drive" the iteration, and which others should do random-access test. Some next()'s (eg OR or AND matchers) are far more costly than other next()'s (eg, TermScorer). Some are far more restrictive than others, etc. Of course, some filters require iterator access, so clearly we must accept that. At some point, there will be too much splintering of options and source code specialization should [somehow] take over in enumerating all the combinations. EG the field-sort collectors are already getting close to this (record score or not, compute max score or not, single field vs multi field, docID required for tie breaking or not, etc). > Merge Query and Filter classes > ------------------------------ > > Key: LUCENE-1518 > URL: https://issues.apache.org/jira/browse/LUCENE-1518 > Project: Lucene - Java > Issue Type: Improvement > Components: Search > Affects Versions: 2.4 > Reporter: Uwe Schindler > Fix For: 2.9 > > Attachments: LUCENE-1518.patch > > > This issue presents a patch, that merges Queries and Filters in a way, that > the new Filter class extends Query. This would make it possible, to use every > filter as a query. > The new abstract filter class would contain all methods of > ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the > Filter's getDocIdSet()/bits() methods he has nothing more to do, he could > just use the filter as a normal query. > I do not want to completely convert Filters to ConstantScoreQueries. The idea > is to combine Queries and Filters in such a way, that every Filter can > automatically be used at all places where a Query can be used (e.g. also > alone a search query without any other constraint). For that, the abstract > Query methods must be implemented and return a "default" weight for Filters > which is the current ConstantScore Logic. If the filter is used as a real > filter (where the API wants a Filter), the getDocIdSet part could be directly > used, the weight is useless (as it is currently, too). The constant score > default implementation is only used when the Filter is used as a Query (e.g. > as direct parameter to Searcher.search()). For the special case of > BooleanQueries combining Filters and Queries the idea is, to optimize the > BooleanQuery logic in such a way, that it detects if a BooleanClause is a > Filter (using instanceof) and then directly uses the Filter API and not take > the burden of the ConstantScoreQuery (see LUCENE-1345). > Here some ideas how to implement Searcher.search() with Query and Filter: > - User runs Searcher.search() using a Filter as the only parameter. As every > Filter is also a ConstantScoreQuery, the query can be executed and returns > score 1.0 for all matching documents. > - User runs Searcher.search() using a Query as the only parameter: No change, > all is the same as before > - User runs Searcher.search() using a BooleanQuery as parameter: If the > BooleanQuery does not contain a Query that is subclass of Filter (the new > Filter) everything as usual. If the BooleanQuery only contains exactly one > Filter and nothing else the Filter is used as a constant score query. If > BooleanQuery contains clauses with Queries and Filters the new algorithm > could be used: The queries are executed and the results filtered with the > filters. > For the user this has the main advantage: That he can construct his query > using a simplified API without thinking about Filters oder Queries, you can > just combine clauses together. The scorer/weight logic then identifies the > cases to use the filter or the query weight API. Just like the query > optimizer of a RDB. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org