chenhh021 opened a new issue, #775: URL: https://github.com/apache/lucenenet/issues/775
I found this bug when searching for solution from Lucene for the last issue I reported. I find it is reproduceable in lucenenet even the original report is for lucene 8.x. The descreptions are from [ LUCENE-10059](https://issues.apache.org/jira/browse/LUCENE-10059). In a rare case an AssertionException will be thrown in the backtrace step of JapaneseTokenizer. If there is a text span of length 1024 (determined by MAX_BACKTRACE_GAP) where the regular backtrace is not called, a forced backtrace will be applied. If the partially best path at this point happens to end at the last pos, and since there is always a final backtrace applied at the end, the final backtrace will try to backtrace from and to the same position, causing an AssertionError in RollingCharBuffer.get() when it tries to generate an empty buffer. It can be reproduced by adding some code in [TestJapaneseTokenizer](https://github.com/apache/lucenenet/blob/11806edbdaa4686b73806066165f27cbbd9aef3b/src/Lucene.Net.Tests.Analysis.Kuromoji/TestJapaneseTokenizer.cs): ```c# public void TestEmptyBacktrace() { String text = ""; // since the max backtrace gap ({@link JapaneseTokenizer#MAX_BACKTRACE_GAP) // is set to 1024, we want the first 1023 characters to generate multiple paths // so that the regular backtrace is not executed. for (int i = 0; i < 1023; i++) { text += "あ"; } // and the last 2 characters to be a valid word so that they // will end-up together text += "手紙"; IList<String> outputs = new List<String>(); for (int i = 0; i < 511; i++) { outputs.Add("ああ"); } outputs.Add("あ"); outputs.Add("手紙"); AssertAnalyzesTo(analyzer, text, outputs.ToArray()); } ``` This can be fixed by stop backtrace when the from and to pos are the same. I will create a PR that port the [Lucene patch](https://github.com/apache/lucene/pull/254/files#diff-519e00792a2747b10ceb9bb643057485e79135502b5869ea6f7ea284e7dafce6). The PR may break the parity with Lucene 4.8 and may not get accepted. But I decide to create it in case that someone meet the same problem. BTW, I find several other Lucene bugs exist in Lucene.net. I've done most of work that port the patches and will create PRs just for reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org