Apologies if this is not the correct place to post this, bit i18n seemed more focused on languages and localization than the mechanics of transcoding.
I have noticed a behavioral difference in JDK8 decoding a two-byte Shift-JIS sequence. Specifically, JDK8 appears to report malformed input for what should be a valid Shift-JIS sequence, where JDK7 reported that the character was unmappable. The code to reproduce is fairly simple: byte[] bytes = {(byte)0xEF, 0x40}; CharsetDecoder decoder = Charset.forName("Shift-JIS").newDecoder(); System.out.println(decoder.decode(ByteBuffer.wrap(bytes), CharBuffer.allocate(2), false)); Note that this is pumping the decoder directly and specifying partial input (false). We use this mechanism in JRuby for transcoding arbitrary byte[] from one encoding to another. The result of running this on JDK7 is "UNMAPPABLE[2]" while the result on JDK8 is "MALFORMED[1]". Information online is spotty as to whether this sequence is valid. It does appear on the table for [JIS X 203](http://x0213.org/codetable/sjis-0213-2004-std.txt) and several articles on Shift-JIS claim that it is at worst undefined and at best valid. So I'm leaning toward this being a bug in JDK8's Shift-JIS decoder. Note that on JDK7 it is "unmappable", which may mean this code represents a character with no equivalent in Unicode. I have uploaded my code to github here: https://github.com/headius/jdk8_utf8_decoding_bug Thoughts? - Charlie