I opened an issue for this one ( https://github.com/apache/lucene/issues/13373). Please feel free to edit or add more info to it.
Regards, Sanjay On Wed, May 15, 2024 at 8:07 PM Michael McCandless < luc...@mikemccandless.com> wrote: > Thanks Jeven, more response inlined below: > > On Tue, May 14, 2024 at 12:58 PM Jerven Tjalling Bolleman > <jerven.bolleman@sib.swiss> wrote: > > The index that had an issue when merging into one segment definitely had > > more than 1 billion times the word "positional" in it. I hope to be able > > to give a closer number once re-indexing finished with a "work-around". > > > > Of course the "work-around" is to just fix this correctly by not having > > that word so often in the index and definitely not as docs, freqs and > > postings. > > > > To be clear, indexing a given token like "positional" (nice token btw) as > many times as you like into a Lucene index, even force merging down to a > single segment, is perfectly allowed, and it certainly should not throw an > exception, let alone a cryptic one like this! That's a valid use-case. > > So we really need to understand why you're even hitting an exception in the > first place ... > > Mike McCandless > > http://blog.mikemccandless.com >