paulirwin commented on PR #1089:
URL: https://github.com/apache/lucenenet/pull/1089#issuecomment-2581662815

   @NightOwl888 In regards to:
   
   > For the Encoding-derived classes, the default fallback character is ?, but 
in Java it is \uFFFD. The new System.Text.Unicode.Utf8 class uses the same 
default fallback character as in Java.
   
   This is only true for `Encoding.ASCII` (at least amongst those defined on 
`Encoding`). `Encoding.UTF8` by default returns `\uFFFD` (from csharprepl):
   
   ```
   > Encoding.ASCII.GetString(new byte[] { 0xc3 })
   "?"
   > Encoding.UTF8.GetString(new byte[] { 0xc3 })
   "�"
   > Encoding.UTF8.GetString(new byte[] { 0xc3 })[0] == '\ufffd'
   true
   ```
   I confirmed this is the case on net462 as well.
   
   The only place in non-test code where `Encoding.ASCII` is used is in 
ConnectionCostsBuilder, and that doesn't matter because this PR changes it to 
throw on invalid characters anyways. 
   
   Given that the rest of the places we use UTF8, I don't think we need to make 
that change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to