[
https://issues.apache.org/jira/browse/SOLR-9331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393870#comment-15393870
]
Christine Poerschke commented on SOLR-9331:
-------------------------------------------
Been reading and investigating some more w.r.t. SOLR-9331 here and SOLR-9336
also - here's my now improved understanding of how the request parameters
{{rows}}, {{start}} and {{reRankDocs}} and the solrconfig.xml element
{{queryResultWindowSize}} combine as far as the {{ReRank(Query|Collector)}} and
the {{QueryResultCache}} are concerned.
The {{start}} parameter defaults to 0 if not supplied and combined with the
{{rows}} parameter it is used for paging, for example if each page is to
contain five documents then the requests would be:
{code}
# page 1
...&rows=5
# page 2
...&rows=5&start=5
# page 3
...&rows=5&start=10
{code}
Next, let's say we wish to apply some sort of reranking to improve search
relevance.
* Here's what the requests would look like if we were to rerank/reorder just
the first page of documents:
{code}
# page 1
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5
# cost: 5 docs retrieved, 5 docs reordered, 5 docs returned
#
# page 2
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5&start=5
# cost: 10 docs retrieved, 5 docs reordered, 5 docs skipped and then 5 docs
returned
#
# page 3
...&rq={!rerank+reRankDocs=5+reRankQuery=$rrq+...}&rrq=...&rows=5&start=10
# cost: 15 docs retrieved, 5 docs reordered, 10 docs skipped and then 5 docs
returned
{code}
* Here's what the requests would look like if we were to rerank/reorder the
first five pages of documents:
{code}
# page 1
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5
# cost: 25 docs retrieved, 25 docs reordered, 5 docs returned
#
# page 2
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5&start=5
# cost: 25 docs retrieved, 25 docs reordered, 5 docs skipped and then 5 docs
returned
#
# page 3
...&rq={!rerank+reRankDocs=25+reRankQuery=$rrq+...}&rrq=...&rows=5&start=10
# cost: 25 docs retrieved, 25 docs reordered, 10 docs skipped and then 5 docs
returned
{code}
Next, let's think about query result caching and demonstrate why {{reRankDocs}}
needs to be part of the
[ReRankQuery.equalTo|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L117]
formula:
{code}
# reranking logic: 'odd ahead of even'
# results without reranking: 1 2 3 4 5 6 7 8 9 10
# results with reRankDocs=7 reranking: 1 3 5 7 2 4 6 8 9 10
# results with reRankDocs=10 reranking: 1 3 5 7 9 2 4 6 8 10
{code}
* conclusion: reRankDocs influences the end result and thus it must form part
of the query result caching logic.
Next, let's consider query result caching, with {{rows}}+{{start}} combined
into a {{length}} variable:
{code}
# reranking logic: 'odd ahead of even'
#
# input: reRankDocs=3&start=0&rows=6&rq=...
# output: [ 1 3 2 4 5 6 ] // populate cache (length=6)
#
# input: reRankDocs=3&start=0&rows=3&rq=...
# output: [ 1 3 2 ] // use (length=6) cache (we need only first half subset of
what is cached)
#
# input: reRankDocs=3&start=3&rows=3&rq=...
# output: [ 4 5 6 ] // use (length=6) cache (we need only second half subset of
what is cached)
#
# input: reRankDocs=3&start=0&rows=4&rq=...
# output: [ 1 3 2 4 ] // use (length=6) cache (we need only first two thirds
subset of what is cached)
#
# input: reRankDocs=3&start=3&rows=4&rq=...
# output: cache lookup returns (length=6) cache entry with too few elements and
so no cache use here
# cache: [ 1 3 2 4 5 6 7 ] // populate cache (length=7)
# output: [ 4 5 6 7 ]
{code}
* conclusions: {{length}} not being part of the query result caching logic
means that
** a length=6 cache entry can be used by some (but not all) subsequent
length!=6 requests
** the cache entry's length must be considered relative to the request's length
and a cache hit is not always a _useable_ cache hit
Following on from this, we can think of the {{queryResultWindowSize}} config
element as a 'rounded up' version of the {{length}} variable:
{code}
# reranking logic: 'odd ahead of even'
#
# input: reRankDocs=3&start=0&rows=6&rq=...
# config: queryResultWindowSize=8
# cache: [ 1 3 2 4 5 6 7 8 ] // populate cache (length=8)
# output: [ 1 3 2 4 5 6 ]
#
# input: reRankDocs=3&start=3&rows=4&rq=...
# output: [ 4 5 6 7 ] // use (length=8) cache (we need only a middle subset of
what is cached)
{code}
* notes:
** the first query slightly overpopulated the cache and thus the second query
could use the cache
** the first query became slightly more expensive (8 vs. 6 docs retrieved)
** the second query became cheaper since the query result cache could be used
Finally, let's reassure ourselves that the {{queryResultWindowSize}} 'rounding
up' does not alter query results themselves:
* The
[mainCollector|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L209]
is used to retrieve documents. The ReRankCollector
[constructor|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L231-L234]
specifies {{Math.max(reRankDocs, length)}} as the numHits for the
mainCollector.
* The ReRankCollector.topDocs
[method|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/ReRankQParserPlugin.java#L258]
obtains {{Math.max(reRankDocs, length)}} documents from the mainCollector and
then reranks/reorders up to {{reRankDocs}} of the obtained documents.
*So, based on the above analysis I would conclude that {{ReRankQuery}}'s length
constructor argument can safely be removed (as is proposed by the attached
patch) and that doing so would be in keeping with the {{queryResultWindowSize}}
logic itself.*
How does that sound? What do you think?
> Can we remove ReRankQuery's length constructor argument?
> --------------------------------------------------------
>
> Key: SOLR-9331
> URL: https://issues.apache.org/jira/browse/SOLR-9331
> Project: Solr
> Issue Type: Wish
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Christine Poerschke
> Priority: Minor
> Attachments: SOLR-9331.patch
>
>
> Can we remove ReRankQuery's length constructor argument? It is a
> ReRankQParserPlugin private class.
> proposed patch summary:
> * change ReRankQuery.getTopDocsCollector to use its len argument (instead of
> the length member)
> * remove ReRankQuery's length member and constructor argument
> * remove ReRankQParser.parse's use of the rows and start parameters
> motivation: towards ReRankQParserPlugin and LTRQParserPlugin (SOLR-8542)
> sharing (more) code
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]