NightOwl888 commented on issue #296:
URL: https://github.com/apache/lucenenet/issues/296#issuecomment-665507225


   I traced an issue that was causing another `IndexOutOfRangeException` in the 
`ThaiTokenizer` to an invalid cast from `int` to `char` that was causing it to 
filter out surrogate pairs when it shouldn't have been. This is the second such 
issue I found this week, and searching through the analyzers for the string 
`(char)`, this appears to be a problem that affects several of them. This is 
definitely a bug that we will need to address.
   
   It might also be useful to know whether the problem you are seeing is 
happening in all cultures. In Java, none of the methods are culture-sensitive, 
so to match the behavior we should be using the invariant culture. .NET has 
[several methods that are culture-sensitive by 
default](https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings).
 While we have gone through to ensure we are not calling any of them in places 
where we shouldn't be, there could be a case or two that were missed or were 
recently added. If you switch the current thread to the invariant culture, does 
it cause the problem to go away?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to