The comment in the code reads slightly different:

// This enusres that reRankDocs >= docs needed to satisfy the result set.
reRankDocs = Math.max(start+rows, reRankDocs);

I think you're right though that this is confusing. The way the
ReRankingQParserPlugin works is that it grabs the top X documents
(reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
to satisfy the page then the result won't have enough documents.

The intended use of this was actually to stop using query re-ranking when
you paged past the reRanked results. So if you re-rank the top 200
documents, you would drop the re-ranking parameter when you page to
documents 201-220.

So the line:
reRankDocs = Math.max(start+rows, reRankDocs);

Saves you from an unexpected shortfall in documents if you do page beyond
the reRankDocs. At the very least the expected use should be documented and
if we can figure out better behavior here that would be great.














Joel Bernstein
Search Engineer at Heliosearch


On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairko...@gmail.com> wrote:

> Looking at this line in the code:
>
> // This enusres that reRankDocs <= docs needed to satisfy the result set.
> reRankDocs = Math.max(start+rows, reRankDocs);
>
> This looks like it would cause skips and duplicates while paging through
> the results, since if you exceed the reRankDocs parameter and keep finding
> things that match the re-ranking query, they'll get boosted earlier
> (skipped), thus pushing down items you already saw (causing duplicates).
>
> It's obviously intentional behavior, but there's no documentation I can
> see of why, if you request fewer documents to be re-ranked than you're
> asking to view, it goes ahead and ignores the number you asked for. What if
> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
> to make the client choose whether to increase the reRankDocs or leave it
> the same?
>
> If no one replies and I have time, I might check out 4.9 and see if I can
> confirm or disprove the bug, but figured I'd bring it up now in case I
> don't end up having time. It would be good to document the reason for this
> behavior if it turns out it's necessary.
>
> Thanks. I'm excited about this feature btw.
>
> --Adair
>

Reply via email to