It seems as if the term frequency for some term exceeded the maximum.
This can happen if you supplied custom term frequencies eg with
https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true
. The behavior didn't change since 8.x but it's possible that the
merging brought together some very "high frequency" terms that were
previously not in the same segment?

On Tue, May 7, 2024 at 4:03 AM Jerven Tjalling Bolleman
<jerven.bolleman@sib.swiss> wrote:
>
> Dear Lucene community,
>
> This morning I found this exception in our logs. This was the first time
> we indexed this data with lucene 9.10. Before we were still on the
> lucene 8.x branch. between the last indexing with 8 and this one with
> 9.10 we have a bit more data so it could be something else that went
> over an limit.
>
> Unfortunately, from this log message I am at a loss for what is going
> on. And what I could do to prevent this from happening. Does anyone have
> any ideas?
>
> Regards,
> Jerven Bolleman
>
>
> Exception in thread "Lucene Merge Thread #202"
> org.apache.lucene.index.MergePolicy$MergeException:
> java.lang.ArithmeticException: integer overflow
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
> Caused by: java.lang.ArithmeticException: integer overflow
> at java.base/java.lang.Math.toIntExact(Math.java:1135)
> at org.apache.lucene.store.DataOutput.writeGroupVInts(DataOutput.java:354)
> at
> org.apache.lucene.codecs.lucene99.Lucene99PostingsWriter.finishTerm(Lucene99PostingsWriter.java:379)
> at
> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:173)
> at
> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.write(Lucene90BlockTreeTermsWriter.java:1097)
> at
> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:398)
> at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95)
> at
> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205)
> at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209)
> at
> org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
> at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
> at
> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
> at
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to