NightOwl888 commented on issue #618: URL: https://github.com/apache/lucenenet/issues/618#issuecomment-1058810622
Just out of curiosity, do all of your use cases work without the `LowerCaseFilter`? Lowercasing is not the same as case folding (which is what `ICUFoldingFilter` does): - *Lowercasing:* Converts the entire string from uppercase to lowercase _in the invariant culture_. - *Case folding:* Folds the case while handling international special cases such as the [infamous Turkish uppercase dotted i](http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html) and the German "ß" (among others). ```c# AssertAnalyzesTo(a, "Fuß", new string[] { "fuss" }); // German AssertAnalyzesTo(a, "QUİT", new string[] { "quit" }); // Turkish ``` While this might not matter for your use case, it is also worth noting that performance will be improved without the `LowerCaseFilter`. In addition, search performance and accuracy can be improved by using a `StopFilter` with a reasonable stop word set to cover your use cases - the only reason I removed it from the demo was because the question was about removing diacritics. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
