NightOwl888 commented on issue #618:
URL: https://github.com/apache/lucenenet/issues/618#issuecomment-1058810622


   Just out of curiosity, do all of your use cases work without the 
`LowerCaseFilter`?
   
   Lowercasing is not the same as case folding (which is what 
`ICUFoldingFilter` does):
   
   - *Lowercasing:* Converts the entire string from uppercase to lowercase _in 
the invariant culture_.
   - *Case folding:* Folds the case while handling international special cases 
such as the [infamous Turkish uppercase dotted 
i](http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html) and 
the German "ß" (among others).
   
   ```c#
               AssertAnalyzesTo(a, "Fuß", new string[] { "fuss" }); // German
   
               AssertAnalyzesTo(a, "QUİT", new string[] { "quit" }); // Turkish
   ```
   
   While this might not matter for your use case, it is also worth noting that 
performance will be improved without the `LowerCaseFilter`.
   
   In addition, search performance and accuracy can be improved by using a 
`StopFilter` with a reasonable stop word set to cover your use cases - the only 
reason I removed it from the demo was because the question was about removing 
diacritics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to