[ 
https://issues.apache.org/jira/browse/IO-341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaniv Kunda updated IO-341:
---------------------------

    Attachment: ByteOrderMark-char.patch

I'm sorry for mixing up different improvements in one patch - I'll open 
distinct issues for them.

In regards to my constant, it does not contain a byte sequence, but the java 
char literal representing the Unicode character code U+FEFF, which is 
documented in http://unicode.org/faq/utf_bom.html#BOM
The byte representation of U+FEFF in UTF-16BE as the two bytes 0xFE,0xFF is 
coincidental.

The name is a preliminary choice I've made to make it short and simple, and is 
open for modification - which is welcome in any other contribution.  Other 
names possibilities include {{ByteOrderMark.CHARACTER}}, 
{{ByteOrderMark.BOM_CHAR}}, {{ByteOrderMark.BOM_CHARACTER}}, 
{{ByteOrderMark.UNICODE_CHAR}}, etc.

And for the most important part - its use: you are right that if a file 
contains a BOM, it can be any of those byte sequences. After all, a file is 
merely a sequence of bytes.
But when working with files (or any other streams) as character streams instead 
of byte streams, one uses byte<->char conversions, using 
InputStreamReader/OutputStreamWriter or CharsetDecoder/CharsetEncoder.
In that case, the Unicode BOM character converted to bytes would yield a 
different byte sequence for each charset (which is exactly what ByteOrderMark 
represents).

For example, if you are working with a Writer and want to output a BOM:
{code:java}
public void writeWithBOM(String filename, String fileContent, Charset charset) 
throws IOException {
    try (Writer writer = new FileWriterWithEncoding(filename, charset)) {
        writer.write(ByteOrderMark.CHAR);
        writer.write(fileContent);
    }
}
{code}

I hope this clarifies the intended use.
                
> A constant for holding the BOM character (U+FEFF) 
> --------------------------------------------------
>
>                 Key: IO-341
>                 URL: https://issues.apache.org/jira/browse/IO-341
>             Project: Commons IO
>          Issue Type: Improvement
>          Components: Streams/Writers
>            Reporter: Yaniv Kunda
>            Priority: Minor
>         Attachments: ByteOrderMark-char.patch
>
>
> This can be useful when working with readers/writers -
> can be put as a constant in ByteOrderMark, for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to