Dear Michael,

Thank you for your help.

We don't use custom term frequencies (I just double checked with a code search). We also always merge down to one segment (historical but also we index once and then there are no changes for a week to a month and then we reindex every document from scratch).

Your response is very helpful already and I very much appreciate it as
it cuts down the search space significantly.

Regards,
Jerven


On 5/7/24 14:03, Michael Sokolov wrote:
It seems as if the term frequency for some term exceeded the maximum.
This can happen if you supplied custom term frequencies eg with
https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true
. The behavior didn't change since 8.x but it's possible that the
merging brought together some very "high frequency" terms that were
previously not in the same segment?

On Tue, May 7, 2024 at 4:03 AM Jerven Tjalling Bolleman
<jerven.bolleman@sib.swiss> wrote:

Dear Lucene community,

This morning I found this exception in our logs. This was the first time
we indexed this data with lucene 9.10. Before we were still on the
lucene 8.x branch. between the last indexing with 8 and this one with
9.10 we have a bit more data so it could be something else that went
over an limit.

Unfortunately, from this log message I am at a loss for what is going
on. And what I could do to prevent this from happening. Does anyone have
any ideas?

Regards,
Jerven Bolleman


Exception in thread "Lucene Merge Thread #202"
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.ArithmeticException: integer overflow
at
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
Caused by: java.lang.ArithmeticException: integer overflow
at java.base/java.lang.Math.toIntExact(Math.java:1135)
at org.apache.lucene.store.DataOutput.writeGroupVInts(DataOutput.java:354)
at
org.apache.lucene.codecs.lucene99.Lucene99PostingsWriter.finishTerm(Lucene99PostingsWriter.java:379)
at
org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:173)
at
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.write(Lucene90BlockTreeTermsWriter.java:1097)
at
org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:398)
at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205)
at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209)
at
org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
at
org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to