On Tue, Apr 26, 2022 at 8:47 AM Robert Muir <[email protected]> wrote:
Analyzers typically have a "testRandomHugeStrings()" in addition to > "testRandom()". It uses huge strings but less iterations of the test > (due to time). And yes, this is the same tester-method that > TestRandomChains uses. > Hi Mike, I don't think this is the only unit test for indexwriter for this situation. There is also a whole dedicated class: https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestExceedMaxTermLength.java Great points Rob! I didn't realize we had a dedicated test class for too-long terms as well. Awesome! I love the BaseTokenStreamTestCase.checkRandomData!! It has found so many crazy issues over the years... it looks like it "typically" makes tokens up to 8K (hmm sometimes 1K, depending on the specific test class) in length, joined with a space character. Probably that is good enough, no need to push the token length beyond IW's hard limit? Mike McCandless http://blog.mikemccandless.com
