Dmitry Tikhonov created SOLR-18227:
--------------------------------------
Summary: Support named queries in Solr
Key: SOLR-18227
URL: https://issues.apache.org/jira/browse/SOLR-18227
Project: Solr
Issue Type: Improvement
Components: query parsers
Affects Versions: 9.10.1, 10.1
Reporter: Dmitry Tikhonov
Solr has no built-in way to answer "which query clauses actually matched this
document?" for a given result set. This is a common need in relevance
debugging, A/B testing pipelines, and rules-based boosting:
you want to know not just that document 42 scored 3.7, but that it matched
the "brand_exact" clause and the "recency_boost" clause, while not matching the
"in_stock" clause.
Lucene has provided the NamedMatches API since Lucene 8 precisely for this
purpose, but Solr has never exposed it.
*Proposed Solution*
1. *_name* *local-param on query parsers* — Add a _name local param to a
focused set of widely-used query parsers. When present, the parser wraps its
result with NamedMatches.wrapQuery(name, query) so the name
travels with the query through scoring and can be recovered post-search:
{code:java}
q={!bool _name=all_books
should='{!term _name=fantasy f=cat}fantasy'
should='{!term _name=scifi f=cat}scifi'}
{code}
1. Supported parsers: term, terms, bool, lucene, prefix, dismax, edismax,
fuzzy.
2. *MatchedQueriesComponent* — A new SearchComponent activated by
matched_queries=true (alias mq=true) that performs a lightweight second pass
over the top-N hits using Weight.matches(). It reports which named
clauses fired per document and as an aggregate summary:
{code:java}
"matched_queries_per_hit": {
"1": ["all_books", "fantasy"],
"5": ["all_books", "scifi"]
},
"matched_queries_summary": {
"all_books": { "count": 7, "docIds": ["1","2","3","4","5","6","7"] },
"fantasy": { "count": 4, "docIds": ["1","2","3","4"] },
"scifi": { "count": 3, "docIds": ["5","6","7"] }
}
{code}
*Implementation Notes*
- The second pass uses *Weight.matches(LeafReaderContext, docId)* — the same
API used by highlighters. It performs per-document posting-list seeks over the
top-N result set only, not a full re-scan of the index.
- *ScoreMode.COMPLETE_NO_SCORES* is used for the matches weight, allowing
Lucene to skip score computation entirely.
- localParams null-safety: all parsers guard localParams != null before
reading _name so the feature is inert when a parser is used as a defType
default (where localParams is null).
- *MatchedQueriesComponent* must be registered in solrconfig.xml and added to
a request handler's component chain. It is a no-op unless matched_queries=true
is present on the request.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]