[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

Shai Erera (JIRA) Wed, 29 Apr 2009 23:35:57 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704499#action_12704499
 ]


Shai Erera commented on LUCENE-1518:
------------------------------------

I would like to query why do we need to make Filter and Query of the same type? 
After all, they both do different things, even though it looks like they are 
similar. Attempting to do this yields those peculiarities:
# If Filter extends Query, it now has to implement all sorts of methods like 
weight, toString, rewrite, getTerms and scoresDocInOrder (an addition from 
LUCENE-1593).
# If Query extends Filter, it has to implement getDocIdSet.
# Introduce instanceof checks in places just to check if a given Query is 
actually a Filter or not.

Both (1) and (2) are completely redundant for both Query and Filter, i.e. why 
should Filter implement toString(term) or scoresDocInOrder when it does score 
docs? Why should Query implement getDocIdSet when it already implements a 
weight().scorer() which returns a DocIdSetIterator?

I read the different posts on this issue and I don't understand why we think 
that the API is not clear enough today, or is not convenient:

* If I want to just filter the entire index, I have two ways: (1) execute a 
search with MatchAllDocsQuery and a Filter (2) Wrap a filter with 
ConstantScoreQuery. I don't see the difference between the two, and I don't 
think it forces any major/difficult decision on the user.
* If I want to have a BooleanQuery with several clauses and I want a clause to 
be a complex one with a Filter, I can wrap the Filter with CSQ.
* If I want to filter a Query, there is already API today on Searcher which 
accepts both Query and Filter.

At least as I understand it, Queries are supposed to score documents, while 
Filters to just filter. If there is an API which requires Queries only, then I 
can wrap my Filter with CSQ, but I'd prefer to check if we can change that API 
first (for example, allowing BooleanClause to accept a Filter, and implement a 
weight(IndexReader) rather than just getQuery()).

So if Filters just filter and Queries just score, the API on both is very 
clear: Filter returns a DISI and Query returns a Scorer (which is also a DISI). 
I don't see the advantage of having the code unaware to the fact a certain 
Query is actually a Fitler - I prefer it to be upfront. That way, we can do all 
sorts of optimizations, like asking the Filter for next() first, if we know 
it's supposed to filter most of the documents.

At the end of the day, both Filter and Query iterate on documents. The 
difference lies in the purpose of iteration. In my code there are several Query 
implementations today that just filter documents, and I plan to change all of 
them to implement Filter instead (that was originally the case because Filter 
had just bits() and now it's more efficient with the iterator() version, at 
least to me). I want to do this for a couple of reasons, clarity being one of 
the most important. If Filter just filters, I don't see why it should inherit 
all the methods from Query (or vice versa BTW), especially when I have this CSQ 
wrapper.
To me, as a Lucene user, I make far more complicated decisions every day than 
deciding whether I want to use a Filter as a Query or not. If I pass it 
directly to IndexSearcher, I use it as a filter. If I use a different API which 
accepts just Query, I wrap it with CSQ. As simple as that.

But that's just my two cents.

> Merge Query and Filter classes
> ------------------------------
>
>                 Key: LUCENE-1518
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1518
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>             Fix For: 2.9
>
>         Attachments: LUCENE-1518.patch
>
>
> This issue presents a patch, that merges Queries and Filters in a way, that 
> the new Filter class extends Query. This would make it possible, to use every 
> filter as a query.
> The new abstract filter class would contain all methods of 
> ConstantScoreQuery, deprecate ConstantScoreQuery. If somebody implements the 
> Filter's getDocIdSet()/bits() methods he has nothing more to do, he could 
> just use the filter as a normal query.
> I do not want to completely convert Filters to ConstantScoreQueries. The idea 
> is to combine Queries and Filters in such a way, that every Filter can 
> automatically be used at all places where a Query can be used (e.g. also 
> alone a search query without any other constraint). For that, the abstract 
> Query methods must be implemented and return a "default" weight for Filters 
> which is the current ConstantScore Logic. If the filter is used as a real 
> filter (where the API wants a Filter), the getDocIdSet part could be directly 
> used, the weight is useless (as it is currently, too). The constant score 
> default implementation is only used when the Filter is used as a Query (e.g. 
> as direct parameter to Searcher.search()). For the special case of 
> BooleanQueries combining Filters and Queries the idea is, to optimize the 
> BooleanQuery logic in such a way, that it detects if a BooleanClause is a 
> Filter (using instanceof) and then directly uses the Filter API and not take 
> the burden of the ConstantScoreQuery (see LUCENE-1345).
> Here some ideas how to implement Searcher.search() with Query and Filter:
> - User runs Searcher.search() using a Filter as the only parameter. As every 
> Filter is also a ConstantScoreQuery, the query can be executed and returns 
> score 1.0 for all matching documents.
> - User runs Searcher.search() using a Query as the only parameter: No change, 
> all is the same as before
> - User runs Searcher.search() using a BooleanQuery as parameter: If the 
> BooleanQuery does not contain a Query that is subclass of Filter (the new 
> Filter) everything as usual. If the BooleanQuery only contains exactly one 
> Filter and nothing else the Filter is used as a constant score query. If 
> BooleanQuery contains clauses with Queries and Filters the new algorithm 
> could be used: The queries are executed and the results filtered with the 
> filters.
> For the user this has the main advantage: That he can construct his query 
> using a simplified API without thinking about Filters oder Queries, you can 
> just combine clauses together. The scorer/weight logic then identifies the 
> cases to use the filter or the query weight API. Just like the query 
> optimizer of a RDB.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-1518) Merge Query and Filter classes

Reply via email to