paulirwin commented on PR #1089:
URL: https://github.com/apache/lucenenet/pull/1089#issuecomment-2585058944

   The change to the tests exposed something else to fix: Java's standard 
Charsets will throw on malformed input and unmappable characters by default, 
meaning `CodingErrorAction.REPORT` is already the default. In .NET, the default 
encodings replace instead of throw, but our 
`StandardCharsets.UTF_8`/`IOUtils.ENCODING_UTF_8_NO_BOM` is already set up to 
match Java in this behavior. That means we do need to explicitly use 
DecoderReplacementFallback for those tests mentioned above in "Explicit 
Encoding Char Replacement" if we use `StandardCharsets.UTF_8`, and will likely 
need another extension method. But more importantly, this means that any place 
in Lucene that uses a Charset without specifying `CodingErrorAction.REPLACE`, 
we need to use one that throws (such as our `StandardCharsets.UTF_8` support 
value), or else we are not throwing on invalid byte sequences when Lucene is 
(i.e. if we use `Encoding.UTF8`). I'll get this change made tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to