paulirwin commented on PR #1089: URL: https://github.com/apache/lucenenet/pull/1089#issuecomment-2581662815
@NightOwl888 In regards to: > For the Encoding-derived classes, the default fallback character is ?, but in Java it is \uFFFD. The new System.Text.Unicode.Utf8 class uses the same default fallback character as in Java. This is only true for `Encoding.ASCII` (at least amongst those defined on `Encoding`). `Encoding.UTF8` by default returns `\uFFFD` (from csharprepl): ``` > Encoding.ASCII.GetString(new byte[] { 0xc3 }) "?" > Encoding.UTF8.GetString(new byte[] { 0xc3 }) "�" > Encoding.UTF8.GetString(new byte[] { 0xc3 })[0] == '\ufffd' true ``` I confirmed this is the case on net462 as well. The only place in non-test code where `Encoding.ASCII` is used is in ConnectionCostsBuilder, and that doesn't matter because this PR changes it to throw on invalid characters anyways. Given that the rest of the places we use UTF8, I don't think we need to make that change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org