NightOwl888 commented on issue #1027:
URL: https://github.com/apache/lucenenet/issues/1027#issuecomment-2562788290

   Looks like you missed `OfflineSorter`. The tests specifically failed when it 
was configured to use a BOM, although I didn't analyze it at a high level to 
find out why that was the case. No objections if you wish to investigate this, 
but it definitely makes a difference as far as the tests are concerned.
   
   It has gone through several rounds of refactoring since then, but currently 
it has a 
[`DEFAULT_ENCODING`](https://github.com/apache/lucenenet/blob/85c01412946ed1e2632cd2dfae4c672efd38caba/src/Lucene.Net/Util/OfflineSorter.cs#L44-L48)
 field that we added to ensure the tests pass. So, we have a couple of options:
   
   1. Remove the `DEFAULT_ENCODING` field and replace it with 
`IOUtils.CHARSET_UTF_8`. Update the OfflineSorter documentation for 
`ByteSequencesReader` and `ByteSequencesWriter` to indicate that constructor 
overloads that accept `BinaryReader` and `BinaryWriter` should use 
`IOUtils.CHARSET_UTF_8`.
   2. Initialize the `DEFAULT_ENCODING` field with the same instance as 
`IOUtils.CHARSET_UTF_8`.
   
   Given the fact that we added this field specifically because `OfflineSorter` 
requires there to be no `BOM` (which difers from the .NET default), this could 
go either way. Given that we recently changed `IOUtils.CHARSET_UTF_8` to remove 
the BOM, using it wasn't an option when the `DEFAULT_ENCODING` field was added. 
If it were, it would have been reused in this case and the field wouldn't have 
been added.
   
   Side note: perhaps we should also rename `IOUtils.CHARSET_UTF_8` because it 
is public and "CharSet" is Java nomenclature. `ENCODING_UTF8_NO_BOM` would be a 
better name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@lucenenet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to