This is definitely a confusing error condition. If we can add more
information without creating an undue burden for the indexer it would
be nice, but I think this will be very challenging here since the
exception is thrown at a low level in the code where there might not
be a lot of useful info (ie the field name) to provide. And I expect
there are other places that make a similar assumption we would have to
track down?

On Tue, May 7, 2024 at 9:10 AM Jerven Tjalling Bolleman
<jerven.bolleman@sib.swiss> wrote:
>
> Dear Michael,
>
> Looking deeper into this. I think we overflowed a term frequency field.
> Looking in some statistics, in a previous release we had 1,288,526,281
> of a certain field, this would be larger now. Each of these would have
> had a limited set of values. But crucially nearly all of them would have
> had the term "positional" or "non-positional" added to the document.
>
> There is no good reason to do this today, we should just turn this into
> a boolean field and update the UI. I will do this and report back.
>
> Do you think that a patch for a try/catch for a more informative log
> message be appreciated by the community? e.g. mentioning the field name
> in the exception?
>
> Regards,
> Jerven
>
> On 5/7/24 14:52, Jerven Tjalling Bolleman wrote:
> > Dear Michael,
> >
> > Thank you for your help.
> >
> > We don't use custom term frequencies (I just double checked with a code
> > search).
> > We also always merge down to one segment (historical but also we index
> > once and then there are no changes for a week to a month and then we
> > reindex every document from scratch).
> >
> > Your response is very helpful already and I very much appreciate it as
> > it cuts down the search space significantly.
> >
> > Regards,
> > Jerven
> >
> >
> > On 5/7/24 14:03, Michael Sokolov wrote:
> >> It seems as if the term frequency for some term exceeded the maximum.
> >> This can happen if you supplied custom term frequencies eg with
> >> https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/analysis/tokenattributes/TermFrequencyAttribute.html?is-external=true
> >> . The behavior didn't change since 8.x but it's possible that the
> >> merging brought together some very "high frequency" terms that were
> >> previously not in the same segment?
> >>
> >> On Tue, May 7, 2024 at 4:03 AM Jerven Tjalling Bolleman
> >> <jerven.bolleman@sib.swiss> wrote:
> >>>
> >>> Dear Lucene community,
> >>>
> >>> This morning I found this exception in our logs. This was the first time
> >>> we indexed this data with lucene 9.10. Before we were still on the
> >>> lucene 8.x branch. between the last indexing with 8 and this one with
> >>> 9.10 we have a bit more data so it could be something else that went
> >>> over an limit.
> >>>
> >>> Unfortunately, from this log message I am at a loss for what is going
> >>> on. And what I could do to prevent this from happening. Does anyone have
> >>> any ideas?
> >>>
> >>> Regards,
> >>> Jerven Bolleman
> >>>
> >>>
> >>> Exception in thread "Lucene Merge Thread #202"
> >>> org.apache.lucene.index.MergePolicy$MergeException:
> >>> java.lang.ArithmeticException: integer overflow
> >>> at
> >>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:735)
> >>> at
> >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:727)
> >>> Caused by: java.lang.ArithmeticException: integer overflow
> >>> at java.base/java.lang.Math.toIntExact(Math.java:1135)
> >>> at
> >>> org.apache.lucene.store.DataOutput.writeGroupVInts(DataOutput.java:354)
> >>> at
> >>> org.apache.lucene.codecs.lucene99.Lucene99PostingsWriter.finishTerm(Lucene99PostingsWriter.java:379)
> >>> at
> >>> org.apache.lucene.codecs.PushPostingsWriterBase.writeTerm(PushPostingsWriterBase.java:173)
> >>> at
> >>> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter$TermsWriter.write(Lucene90BlockTreeTermsWriter.java:1097)
> >>> at
> >>> org.apache.lucene.codecs.lucene90.blocktree.Lucene90BlockTreeTermsWriter.write(Lucene90BlockTreeTermsWriter.java:398)
> >>> at org.apache.lucene.codecs.FieldsConsumer.merge(FieldsConsumer.java:95)
> >>> at
> >>> org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsWriter.merge(PerFieldPostingsFormat.java:205)
> >>> at
> >>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:209)
> >>> at
> >>> org.apache.lucene.index.SegmentMerger.mergeWithLogging(SegmentMerger.java:298)
> >>> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:137)
> >>> at
> >>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:5252)
> >>> at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4740)
> >>> at
> >>> org.apache.lucene.index.IndexWriter$IndexWriterMergeSource.merge(IndexWriter.java:6541)
> >>> at
> >>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:639)
> >>> at
> >>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:700)
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >>> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to