Dmitry Tikhonov created SOLR-18227:
--------------------------------------

             Summary: Support named queries in Solr
                 Key: SOLR-18227
                 URL: https://issues.apache.org/jira/browse/SOLR-18227
             Project: Solr
          Issue Type: Improvement
          Components: query parsers
    Affects Versions: 9.10.1, 10.1
            Reporter: Dmitry Tikhonov


  Solr has no built-in way to answer "which query clauses actually matched this 
document?" for a given result set. This is a common need in relevance 
debugging, A/B testing pipelines, and rules-based boosting: 

  you want to know not just that document 42 scored 3.7, but that it matched 
the "brand_exact" clause and the "recency_boost" clause, while not matching the 
"in_stock" clause.

  Lucene has provided the NamedMatches API since Lucene 8 precisely for this 
purpose, but Solr has never exposed it.                                         
                                                      

  *Proposed Solution*

  1. *_name* *local-param on query parsers* — Add a _name local param to a 
focused set of widely-used query parsers. When present, the parser wraps its 
result with NamedMatches.wrapQuery(name, query) so the name    

  travels with the query through scoring and can be recovered post-search:      
                      

 
{code:java}
q={!bool _name=all_books 
should='{!term _name=fantasy f=cat}fantasy'
should='{!term _name=scifi   f=cat}scifi'}       
{code}
  1. Supported parsers: term, terms, bool, lucene, prefix, dismax, edismax, 
fuzzy. 

  2. *MatchedQueriesComponent* — A new SearchComponent activated by 
matched_queries=true (alias mq=true) that performs a lightweight second pass 
over the top-N hits using Weight.matches(). It reports which named 

  clauses fired per document and as an aggregate summary:                       
                                

 
{code:java}
"matched_queries_per_hit": {                 
    "1": ["all_books", "fantasy"],                          
    "5": ["all_books", "scifi"]                  
  },                                                        
  "matched_queries_summary": {                                                  
  "all_books": { "count": 7, "docIds": ["1","2","3","4","5","6","7"] },
    "fantasy":   { "count": 4, "docIds": ["1","2","3","4"] },
    "scifi":     { "count": 3, "docIds": ["5","6","7"] }
  }                              
{code}
  *Implementation Notes*                                      

  - The second pass uses *Weight.matches(LeafReaderContext, docId)* — the same 
API used by highlighters. It performs per-document posting-list seeks over the 
top-N result set only, not a full re-scan of the index.

  - *ScoreMode.COMPLETE_NO_SCORES* is used for the matches weight, allowing 
Lucene to skip score computation entirely.                                      
                                                         

  - localParams null-safety: all parsers guard localParams != null before 
reading _name so the feature is inert when a parser is used as a defType 
default (where localParams is null).  

  - *MatchedQueriesComponent* must be registered in solrconfig.xml and added to 
a request handler's component chain. It is a no-op unless matched_queries=true 
is present on the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to