BOMExclusionInputStream - an InputStream for UTF-8 data that ignores an initial 
Byte Order mark
-----------------------------------------------------------------------------------------------

                 Key: IO-178
                 URL: https://issues.apache.org/jira/browse/IO-178
             Project: Commons IO
          Issue Type: New Feature
          Components: Streams/Writers
    Affects Versions: 1.4
            Reporter: Keith D Gregory
            Priority: Minor


Microsoft tools have the unpleasant habit of writing a byte order mark (the 
three-byte sequence 0xEF 0xBB 0xBF) at the start of a UTF-8 encoded file.

The CharsetDecoder supplied with the JDK does not simply discard these bytes, 
but instead returns the BOM character (0xFEFF); see 
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911 for discussion on 
this.

This makes life unpleasant for anyone who is processing text data, as the 
program must look for this character and ignore it.

The BOMExclusionInputStream class is a work-around: it recognizes the BOM at 
the start of the stream, and skips over it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to