[
https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563743#action_12563743
]
Steven Rowe commented on LUCENE-1157:
-------------------------------------
bq. there are unidentifiable characters in Changes.html. They are also in
CHANGES.txt. I'm sure I read something about why they are added but cannot find
it now.
The first three bytes of CHANGES.txt are a UTF-8 BOM (byte-order mark). In
Unicode's fixed-width encodings, e.g. UTF-16, the character U+FEFF is reserved
for the beginnings of streams to denote the endian-ness of the character
serialization.
UTF-8 is non-endian (invariant byte order given a character); the use of the
BOM in UTF-8, where it is serialized as three bytes, is solely to indicate that
the encoding of the stream is UTF-8.
Microsoft's tools like to put BOMs at the beginnings of UTF-8 encoded files.
> Formatable changes log (CHANGES.txt is easy to edit but not so friendly to
> read by Lucene users)
> -------------------------------------------------------------------------------------------------
>
> Key: LUCENE-1157
> URL: https://issues.apache.org/jira/browse/LUCENE-1157
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Website
> Reporter: Doron Cohen
> Assignee: Doron Cohen
> Fix For: 2.4
>
> Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch,
> lucene-1157.patch
>
>
> Background in http://www.nabble.com/formatable-changes-log-tt15078749.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]