[ 
https://issues.apache.org/jira/browse/SOLR-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393870#comment-15393870
 ] 

Christine Poerschke commented on SOLR-9331:
-------------------------------------------


Been reading and investigating some more w.r.t. SOLR-9331 here and SOLR-9336 
also - here's my now improved understanding of how the request parameters 
{{rows}}, {{start}} and {{reRankDocs}} and the solrconfig.xml element 
{{queryResultWindowSize}} combine as far as the {{ReRank(Query|Collector)}} and 
the {{QueryResultCache}} are concerned.

The {{start}} parameter defaults to 0 if not supplied and combined with the 
{{rows}} parameter it is used for paging, for example if each page is to 
contain five documents then the requests would be:
{code}
# page 1
...&rows=5
# page 2
...&rows=5&start=5
# page 3
...&rows=5&start=10
{code}

Next, let's say we wish to apply some sort of reranking to improve search 
relevance.
* Here's what the requests would look like if we were to rerank/reorder just 
the first page of documents:
{code}
# page 1
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5
# cost: 5 docs retrieved, 5 docs reordered, 5 docs returned
#
# page 2
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5&start=5
# cost: 10 docs retrieved, 5 docs reordered, 5 docs skipped and then 5 docs 
returned
#
# page 3
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5&start=10
# cost: 15 docs retrieved, 5 docs reordered, 10 docs skipped and then 5 docs 
returned
{code}
* Here's what the requests would look like if we were to rerank/reorder the 
first five pages of documents:
{code}
# page 1
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5
# cost: 25 docs retrieved, 25 docs reordered, 5 docs returned
#
# page 2
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5&start=5
# cost: 25 docs retrieved, 25 docs reordered, 5 docs skipped and then 5 docs 
returned
#
# page 3
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5&start=10
# cost: 25 docs retrieved, 25 docs reordered, 10 docs skipped and then 5 docs 
returned
{code}

Next, let's think about query result caching and demonstrate why {{reRankDocs}} 
needs to be part of the 
[ReRankQuery.equalTo|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L117]
 formula:
{code}
# reranking logic: 'odd ahead of even'
# results without            reranking: 1 2 3 4 5 6 7 8 9 10
# results with reRankDocs=7  reranking: 1 3 5 7 2 4 6 8 9 10
# results with reRankDocs=10 reranking: 1 3 5 7 9 2 4 6 8 10
{code}
* conclusion: reRankDocs influences the end result and thus it must form part 
of the query result caching logic.

Next, let's consider query result caching, with {{rows}}+{{start}} combined 
into a {{length}} variable:
{code}
# reranking logic: 'odd ahead of even'
#
#  input: reRankDocs=3&start=0&rows=6&rq=...
# output: [ 1 3 2 4 5 6 ] // populate cache (length=6)
#
#  input: reRankDocs=3&start=0&rows=3&rq=...
# output: [ 1 3 2 ] // use (length=6) cache (we need only first half subset of 
what is cached)
#
#  input: reRankDocs=3&start=3&rows=3&rq=...
# output: [ 4 5 6 ] // use (length=6) cache (we need only second half subset of 
what is cached)
#
#  input: reRankDocs=3&start=0&rows=4&rq=...
# output: [ 1 3 2 4 ] // use (length=6) cache (we need only first two thirds 
subset of what is cached)
#
#  input: reRankDocs=3&start=3&rows=4&rq=...
# output: cache lookup returns (length=6) cache entry with too few elements and 
so no cache use here
#  cache: [ 1 3 2 4 5 6 7 ] // populate cache (length=7)
# output: [ 4 5 6 7 ]
{code}
* conclusions: {{length}} not being part of the query result caching logic 
means that
** a length=6 cache entry can be used by some (but not all) subsequent 
length!=6 requests  
** the cache entry's length must be considered relative to the request's length 
and a cache hit is not always a _useable_ cache hit

Following on from this, we can think of the {{queryResultWindowSize}} config 
element as a 'rounded up' version of the {{length}} variable:
{code}
# reranking logic: 'odd ahead of even'
#
#  input: reRankDocs=3&start=0&rows=6&rq=...
# config: queryResultWindowSize=8
#  cache: [ 1 3 2 4 5 6 7 8 ] // populate cache (length=8)
# output: [ 1 3 2 4 5 6 ]
#
#  input: reRankDocs=3&start=3&rows=4&rq=...
# output: [ 4 5 6 7 ] // use (length=8) cache (we need only a middle subset of 
what is cached)
{code}
* notes:
** the first query slightly overpopulated the cache and thus the second query 
could use the cache
** the first query became slightly more expensive (8 vs. 6 docs retrieved)
** the second query became cheaper since the query result cache could be used

Finally, let's reassure ourselves that the {{queryResultWindowSize}} 'rounding 
up' does not alter query results themselves:
* The 
[mainCollector|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L209]
 is used to retrieve documents. The ReRankCollector 
[constructor|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L231-L234]
 specifies {{Math.max(reRankDocs, length)}} as the numHits for the 
mainCollector.
* The ReRankCollector.topDocs 
[method|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L258]
 obtains {{Math.max(reRankDocs, length)}} documents from the mainCollector and 
then reranks/reorders up to {{reRankDocs}} of the obtained documents.

*So, based on the above analysis I would conclude that {{ReRankQuery}}'s length 
constructor argument can safely be removed (as is proposed by the attached 
patch) and that doing so would be in keeping with the {{queryResultWindowSize}} 
logic itself.*

How does that sound? What do you think?

> Can we remove ReRankQuery's length constructor argument?
> --------------------------------------------------------
>
>                 Key: SOLR-9331
>                 URL: https://issues.apache.org/jira/browse/SOLR-9331
>             Project: Solr
>          Issue Type: Wish
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Christine Poerschke
>            Priority: Minor
>         Attachments: SOLR-9331.patch
>
>
> Can we remove ReRankQuery's length constructor argument? It is a 
> ReRankQParserPlugin private class.
> proposed patch summary:
> * change ReRankQuery.getTopDocsCollector to use its len argument (instead of 
> the length member)
> * remove ReRankQuery's length member and constructor argument
> * remove ReRankQParser.parse's use of the rows and start parameters
> motivation: towards ReRankQParserPlugin and LTRQParserPlugin (SOLR-8542) 
> sharing (more) code



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to