On Tue, Apr 26, 2022 at 8:47 AM Robert Muir <[email protected]> wrote:

Analyzers typically have a "testRandomHugeStrings()" in addition to
> "testRandom()". It uses huge strings but less iterations of the test
> (due to time). And yes, this is the same tester-method that
> TestRandomChains uses.


> Hi Mike, I don't think this is the only unit test for indexwriter for
this situation. There is also a whole dedicated class:
https://github.com/apache/lucene/blob/main/lucene/core/src/test/org/apache/lucene/index/TestExceedMaxTermLength.java


Great points Rob!  I didn't realize we had a dedicated test class for
too-long terms as well.  Awesome!

I love the BaseTokenStreamTestCase.checkRandomData!!  It has found so many
crazy issues over the years... it looks like it "typically" makes tokens up
to 8K (hmm sometimes 1K, depending on the specific test class) in length,
joined with a space character.  Probably that is good enough, no need to
push the token length beyond IW's hard limit?

Mike McCandless

http://blog.mikemccandless.com

Reply via email to