NightOwl888 commented on issue #792: URL: https://github.com/apache/lucenenet/issues/792#issuecomment-1900926732
So, this doesn't look like an issue at all with Lucene.NET. You are specifying `Encoding.UTF8`. In .NET Framework, when you do that you get `UTF8Encoding(true)` - which emits a BOM. https://github.com/microsoft/referencesource/blob/master/mscorlib/system/text/encoding.cs#L1549 In .NET Core, they changed it to be the same as `Encoding.Default`, which creates a `UTF8Encoding` instance without a BOM. Presumably, this is to match the default behavior of Java. https://github.com/dotnet/runtime/blob/v6.0.26/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs#L1071 https://github.com/dotnet/runtime/blob/v6.0.26/src/libraries/System.Private.CoreLib/src/System/Text/Encoding.cs#L80-L83 In short, if you want encoding without a BOM, you have to specify it that way. On .NET Framework, you have to call the constructor and pass `false` for the `encoderShouldEmitUTF8Identifier` parameter. ```c# var utf8NoBOM = new UTF8Encoding(encoderShouldEmitUTF8Identifier: false); ``` https://learn.microsoft.com/en-us/dotnet/api/system.text.utf8encoding.-ctor?view=net-8.0#system-text-utf8encoding-ctor(system-boolean) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org