[
https://issues.apache.org/jira/browse/SOLR-16858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17799220#comment-17799220
]
Chris M. Hostetter commented on SOLR-16858:
-------------------------------------------
(NOTE: I'm responding to some questions/comments out of order so that I can
group the answers by logical topic)
----
{quote}I also agree with David, few people know that to run post-filtering you
need ... for sure un-related with vector-based search, but still relevant ...
{quote}
I have conflicting opinions about the API for using PostFilters, but agree we
shouldn't let that bog down this discussion of the knn parser – if one of you
opens an issue regarding that please link from here and i'll try to comment
there w/thoughts later.
----
{quote}In terms of the 'postFiltering bug', once we add the local param, and
remove the slurping, is any bug left?
{quote}
I re-reviewed the code today and I do not believe there is any "postFiltering
bug" on either the {{main}} branch, or the {{jira/SOLR-16858}} branch.
{{KnnQParser}} 's use of {{SolrIndexSearcher.getProcessedFilter}} will ensure
that any PostFilters are excluded from the "pre-filter wrapping".
(We can add a test that proves it by checking the result of
{{KnnVectorQuery.getFilter()}} after using {{KnnQParser}} in a request with
PostFilters)
I made my earlier comment about a (possible) bug in the PostFilter situation
based on some confusion I had about the motivation for your PR1246. I thought
_YOU_ were saying there was a bug with knn + PostFilters, which PR1246 would
fix by delaying the pre-filter wrapping until after SolrIndexSearcher had
computed the PostFilters (and that you had been blocked on committing PR1246 to
fix the "bug" while waiting for the lucene change).
Just a misunderstanding on my part.
----
{quote}I think it makes sense to have a local parameter (which we can call
'preFilter' maybe?)
{quote}
I'm happy to change it to " {{preFilter}} "
----
{quote}This is a breaking change, so we'll have to clarify as a warning
somewhere as users will see a drastic difference (people using FQ as
pre-filters will have to switch to the local parameter instead (but I think
it's fair).
{quote}
I'm not certain if we should _EVER_ remove the fq slurping logic completely –
but I DEFINITELY do not think we should do so in a way that would be a breaking
change on upgrade. (if we're going to do it, it should first be deprecated,
with some new param or config option to enable/disable, and wait for a major
version to change the default)
The reason I'm not convinced we should *EVER* completely remove the fq slurping
ties into your other question...
{quote}But I'm struggling to understand why we would need the include/exclude
tags as local parameters for KNN.
Once we set that the local preFilter is independent of the global FQs, why
should we involve tags?
{quote}
This goes back to those other 2 usecases (or maybe "user profiles" is a better
description) that I mentioned in my first comment on this jira: it makes a lot
of sense for some users to have the slurping of global FQ params – as long as
we give them some control over how it's done...
{quote}... i've talked to some folks who think about using KNN very differently
then i do, who really like the current behavior of slurping up all of the
global fq params, but they really need the ability to exclude a few fq params
based on tags – in the same way that multiselect faceting requires using
{{excludeTags}} as you add {{fq}} params for drilldown. ...
... so that the numFound starts to decrease, as all of the facet drill down fq
params are applied independently to the knn.
...
Part of my idea is to not only support an {{excludeTags}} localparam, but also
an {{includeTags}} param for folks who might have a lot of {{fq}} params that
they use in _every_ request (ie: invariants/appends on the handler), and
if/when the q param uses knn they want a large subset (but not {_}all{_}) of
their fq params to be slurped up by the knn parser.
{quote}
The last part of your question was...
{quote}Is it just to avoid re-writing a pre-filter condition that has been
already specified by a tag in the global?
{quote}
...and I guess that's one way of looking at it – but to me it's more about
giving as much flexibility as possible for how people can specify the
pre-filter.
To go back to the earlier part of your question...
{quote}Once we set that the local preFilter is independent of the global FQs,
why should we involve tags?
{quote}
My thinking about this issue is that {{KnnQParser}} should support multiple
ways of defining the pre-filter that exists on the resulting {{KnnVectorQuery}}
:
* (1) When {{KnnQParser}} is used for the main query string, FQ params are
used to build the _default_ pre-filter on the {{KnnVectorQuery}}
** if you find yourself in a situation where you only want to use _some_ (but
not all) of the global FQs - you can use {{includeTags}} or {{excludeTags}}
localparams to adjust the set of FQ params used
* (2) Regardless of where/how you use {{KnnQParser}} , a {{preFilter}} local
param can be used to override the default behavior and explicit set the
pre-filter on the {{KnnVectorQuery}}
...ie: there doesn't have to be one and only one way of specifying what gets
put in {{KnnVectorQuery.getFilter}} – we can give the user options (and in some
use cases, one option will be easier to use then others)
does that make sense?
----
I'm about to go offline until January, but when I get back I'll have some time
to update the jira branch to:
* change the local param name to " {{preFilter}} "
* add a clear-box test of the post filter situation to prove they are
automatically excluded
* add ref-guide edits of the new params with some examples.
> Allow KnnQParser to selectively apply filters
> ---------------------------------------------
>
> Key: SOLR-16858
> URL: https://issues.apache.org/jira/browse/SOLR-16858
> Project: Solr
> Issue Type: Bug
> Reporter: Joel Bernstein
> Assignee: Chris M. Hostetter
> Priority: Major
> Labels: hybrid-search
> Attachments: SOLR-16858-1.patch, SOLR-16858.patch
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> The KnnQParser is parsing the filter query which limits the rows considered
> by the vector query with the following method:
> {code:java}
> private Query getFilterQuery() throws SolrException, SyntaxError {
> boolean isSubQuery = recurseCount != 0;
> if (!isFilter() && !isSubQuery) {
> String[] filterQueries = req.getParams().getParams(CommonParams.FQ);
> if (filterQueries != null && filterQueries.length != 0) {
> try {
> List<Query> filters = QueryUtils.parseFilterQueries(req);
> SolrIndexSearcher.ProcessedFilter processedFilter =
> req.getSearcher().getProcessedFilter(filters);
> return processedFilter.filter;
> } catch (IOException e) {
> throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, e);
> }
> }
> }
> return null;
> }
> {code}
> This is pulling all filter queries from the main query parameters and using
> them to limit the vector query. This is the automatic behavior of the
> KnnQParser.
> There are cases where you may want to selectively apply different filters.
> One such case is SOLR-16857 which involves reRanking a collapsed query.
> Overriding the default filter behavior could be done by adding an "fq" local
> parameter to the KnnQParser which would override the default filtering
> behavior.
> {code:java}
> {!knn f=vector topK=10 fq=$kfq}[...]&kfq=myquery
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]