[
https://issues.apache.org/jira/browse/IO-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaniv Kunda updated IO-341:
---------------------------
Attachment: ByteOrderMark-char.patch
I'm sorry for mixing up different improvements in one patch - I'll open
distinct issues for them.
In regards to my constant, it does not contain a byte sequence, but the java
char literal representing the Unicode character code U+FEFF, which is
documented in http://unicode.org/faq/utf_bom.html#BOM
The byte representation of U+FEFF in UTF-16BE as the two bytes 0xFE,0xFF is
coincidental.
The name is a preliminary choice I've made to make it short and simple, and is
open for modification - which is welcome in any other contribution. Other
names possibilities include {{ByteOrderMark.CHARACTER}},
{{ByteOrderMark.BOM_CHAR}}, {{ByteOrderMark.BOM_CHARACTER}},
{{ByteOrderMark.UNICODE_CHAR}}, etc.
And for the most important part - its use: you are right that if a file
contains a BOM, it can be any of those byte sequences. After all, a file is
merely a sequence of bytes.
But when working with files (or any other streams) as character streams instead
of byte streams, one uses byte<->char conversions, using
InputStreamReader/OutputStreamWriter or CharsetDecoder/CharsetEncoder.
In that case, the Unicode BOM character converted to bytes would yield a
different byte sequence for each charset (which is exactly what ByteOrderMark
represents).
For example, if you are working with a Writer and want to output a BOM:
{code:java}
public void writeWithBOM(String filename, String fileContent, Charset charset)
throws IOException {
try (Writer writer = new FileWriterWithEncoding(filename, charset)) {
writer.write(ByteOrderMark.CHAR);
writer.write(fileContent);
}
}
{code}
I hope this clarifies the intended use.
> A constant for holding the BOM character (U+FEFF)
> --------------------------------------------------
>
> Key: IO-341
> URL: https://issues.apache.org/jira/browse/IO-341
> Project: Commons IO
> Issue Type: Improvement
> Components: Streams/Writers
> Reporter: Yaniv Kunda
> Priority: Minor
> Attachments: ByteOrderMark-char.patch
>
>
> This can be useful when working with readers/writers -
> can be put as a constant in ByteOrderMark, for example.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira