BOMExclusionInputStream - an InputStream for UTF-8 data that ignores an initial
Byte Order mark
-----------------------------------------------------------------------------------------------
Key: IO-178
URL: https://issues.apache.org/jira/browse/IO-178
Project: Commons IO
Issue Type: New Feature
Components: Streams/Writers
Affects Versions: 1.4
Reporter: Keith D Gregory
Priority: Minor
Microsoft tools have the unpleasant habit of writing a byte order mark (the
three-byte sequence 0xEF 0xBB 0xBF) at the start of a UTF-8 encoded file.
The CharsetDecoder supplied with the JDK does not simply discard these bytes,
but instead returns the BOM character (0xFEFF); see
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6378911 for discussion on
this.
This makes life unpleasant for anyone who is processing text data, as the
program must look for this character and ignore it.
The BOMExclusionInputStream class is a work-around: it recognizes the BOM at
the start of the stream, and skips over it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.