[jira] [Commented] (LUCENE-8221) MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes

Robert Muir (JIRA) Mon, 26 Mar 2018 00:45:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16413521#comment-16413521
 ]


Robert Muir commented on LUCENE-8221:
-------------------------------------

{quote}

If the number of documents is 0, nothing really happens. What I'd consider and 
odd behavior is if this method could fluctuate depending on how many deletes or 
merges you had up to the point of invoking it... It'd confuse me (and I guess 
others not familiar with Lucene indexing internals) a lot.

{quote}

 

You use docFreq and numDocs. Because you use docFreq, its still gonna fluctuate 
with merges.

But now in addition to that, you've got skew. Your argument fails.

Sorry, I don't think my concerns should be brushed aside here. this is 
definitely related to the issue. The formula is simply wrong and we should fix 
it.

> MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-8221
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8221
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-8221.patch
>
>
> {code}
>   public void setMaxDocFreqPct(int maxPercentage) {
>     this.maxDocFreq = maxPercentage * ir.numDocs() / 100;
>   }
> {code}
> The above overflows integer range into negative numbers on even fairly small 
> indexes (for maxPercentage = 75, it happens for just over 28 million 
> documents.
> We should make the computations on long range so that it doesn't overflow and 
> have a more strict argument validation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-8221) MoreLikeThis.setMaxDocFreqPct can easily int-overflow on larger indexes

Reply via email to