NightOwl888 commented on issue #792:
URL: https://github.com/apache/lucenenet/issues/792#issuecomment-1900926732

   So, this doesn't look like an issue at all with Lucene.NET. You are 
specifying `Encoding.UTF8`.
   
   In .NET Framework, when you do that you get `UTF8Encoding(true)` - which 
emits a BOM.
   
   
https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/encoding.cs#L1549
   
   In .NET Core, they changed it to be the same as `Encoding.Default`, which 
creates a `UTF8Encoding` instance without a BOM. Presumably, this is to match 
the default behavior of Java.
   
   
https://github.com/dotnet/runtime/blob/v6.0.26/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs#L1071
   
https://github.com/dotnet/runtime/blob/v6.0.26/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs#L80-L83
   
   In short, if you want encoding without a BOM, you have to specify it that 
way. On .NET Framework, you have to call the constructor and pass `false` for 
the `encoderShouldEmitUTF8Identifier` parameter.
   
   ```c#
   var utf8NoBOM = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false);
   ```
   
   
https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=net-8.0#system-text-utf8encoding-ctor(system-boolean)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to