I think when encountering a single high surrogate, it is correct to return a length of either 1 or 2. A thought experiment: a cosmic ray that mangled exactly one char could have caused this situation if the original sequence was of length either 1 or 2, depending on which char was mangled.
Not a Defect. Martin On Tue, Sep 9, 2008 at 14:38, Ulf Zibis <[EMAIL PROTECTED]> wrote: > Hi all, > > as you maybe noticed, I'm working on enhancement of sun.nio.cs package: > https://java-nio-charset-enhanced.dev.java.net/ > > Unicode code points > \uFFFF are synthesized in the JVM by 2 chars, called > surrogates. > The 1st char, called high surrogate, is in the Range of \uD800..\uDBFF, and > the 2nd char, called low surrogate, is in the Range of \uDC00..\uDFFF, and > > 1.) If the 1st char is erroneously in the Range of \uDC00..\uDFFF, > sun.nio.cs encoders return a CoderResult.malformedForLength(1). OK. > 2.) If the 1st char is correctly in the Range of \uD800..\uDBFF, but the 2nd > char is erroneously NOT in the Range of \uDC00..\uDFFF, sun.nio.cs encoders > mostly (I have not tested all) also return a > CoderResult.malformedForLength(1). > > IMO for the 2. case, the encoders should return > CoderResult.malformedForLength(2), because the code point, which is wrong, > consists of 2 chars. > Additionally, it would be much easier to skip the wrong code point in the > concerning java.nio.CharBuffer, by just utilizing CoderResult.length(). > > See also: > http://java.sun.com/javase/6/docs/api/java/nio/charset/CoderResult.html#length() > > What do you think about this ??? > > I'm thinking about reporting a bug concerning this "wrong" encoder result. > > Thanks in advance for a brisk discussion. > > -Ulf > > > >
