paulirwin commented on PR #1089: URL: https://github.com/apache/lucenenet/pull/1089#issuecomment-2585058944
The change to the tests exposed something else to fix: Java's standard Charsets will throw on malformed input and unmappable characters by default, meaning `CodingErrorAction.REPORT` is already the default. In .NET, the default encodings replace instead of throw, but our `StandardCharsets.UTF_8`/`IOUtils.ENCODING_UTF_8_NO_BOM` is already set up to match Java in this behavior. That means we do need to explicitly use DecoderReplacementFallback for those tests mentioned above in "Explicit Encoding Char Replacement" if we use `StandardCharsets.UTF_8`, and will likely need another extension method. But more importantly, this means that any place in Lucene that uses a Charset without specifying `CodingErrorAction.REPLACE`, we need to use one that throws (such as our `StandardCharsets.UTF_8` support value), or else we are not throwing on invalid byte sequences when Lucene is (i.e. if we use `Encoding.UTF8`). I'll get this change made tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org