The comment in the code reads slightly different: // This enusres that reRankDocs >= docs needed to satisfy the result set. reRankDocs = Math.max(start+rows, reRankDocs);
I think you're right though that this is confusing. The way the ReRankingQParserPlugin works is that it grabs the top X documents (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough to satisfy the page then the result won't have enough documents. The intended use of this was actually to stop using query re-ranking when you paged past the reRanked results. So if you re-rank the top 200 documents, you would drop the re-ranking parameter when you page to documents 201-220. So the line: reRankDocs = Math.max(start+rows, reRankDocs); Saves you from an unexpected shortfall in documents if you do page beyond the reRankDocs. At the very least the expected use should be documented and if we can figure out better behavior here that would be great. Joel Bernstein Search Engineer at Heliosearch On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairko...@gmail.com> wrote: > Looking at this line in the code: > > // This enusres that reRankDocs <= docs needed to satisfy the result set. > reRankDocs = Math.max(start+rows, reRankDocs); > > This looks like it would cause skips and duplicates while paging through > the results, since if you exceed the reRankDocs parameter and keep finding > things that match the re-ranking query, they'll get boosted earlier > (skipped), thus pushing down items you already saw (causing duplicates). > > It's obviously intentional behavior, but there's no documentation I can > see of why, if you request fewer documents to be re-ranked than you're > asking to view, it goes ahead and ignores the number you asked for. What if > I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better > to make the client choose whether to increase the reRankDocs or leave it > the same? > > If no one replies and I have time, I might check out 4.9 and see if I can > confirm or disprove the bug, but figured I'd bring it up now in case I > don't end up having time. It would be good to document the reason for this > behavior if it turns out it's necessary. > > Thanks. I'm excited about this feature btw. > > --Adair >