[
https://issues.apache.org/jira/browse/LUCENE-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413521#comment-16413521
]
Robert Muir commented on LUCENE-8221:
-------------------------------------
{quote}
If the number of documents is 0, nothing really happens. What I'd consider and
odd behavior is if this method could fluctuate depending on how many deletes or
merges you had up to the point of invoking it... It'd confuse me (and I guess
others not familiar with Lucene indexing internals) a lot.
{quote}
You use docFreq and numDocs. Because you use docFreq, its still gonna fluctuate
with merges.
But now in addition to that, you've got skew. Your argument fails.
Sorry, I don't think my concerns should be brushed aside here. this is
definitely related to the issue. The formula is simply wrong and we should fix
it.
> MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes
> -----------------------------------------------------------------------
>
> Key: LUCENE-8221
> URL: https://issues.apache.org/jira/browse/LUCENE-8221
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Minor
> Attachments: LUCENE-8221.patch
>
>
> {code}
> public void setMaxDocFreqPct(int maxPercentage) {
> this.maxDocFreq = maxPercentage * ir.numDocs() / 100;
> }
> {code}
> The above overflows integer range into negative numbers on even fairly small
> indexes (for maxPercentage = 75, it happens for just over 28 million
> documents.
> We should make the computations on long range so that it doesn't overflow and
> have a more strict argument validation.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]