Dmitry M. Kononov wrote:
Hi Richard,

On 4/6/06, Richard Liang <[EMAIL PROTECTED]> wrote:

Dmitry M. Kononov wrote:
As you exactly noticed the cause of this issue that Harmony uses the
little-endian byte order, if an encoded UTF-16 sequence has no
byte-order
mark. However, the spec reads such a case explicitly as follows:

"When decoding, the UTF-16 charset interprets a byte-order mark to
indicate
the byte order of the stream but defaults to big-endian if there is no
byte-order mark; when encoding, it uses big-endian byte order and writes
a
big-endian byte-order mark."


Hello Dmitry,

Yes, although Harmony and RI use different byte order, as both Harmony
and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
compliant with the specification. So could we regard Harmony-308 as "not
a bug"?


I think Harmony's behavior in this case is inconsistent with the java spec,
since the spec defines the expected behavior explicitly:
"when encoding, it uses big-endian byte order and writes a big-endian
byte-order mark." But Harmony's encode() returns bytes in the little-endian
order.

It seems I do not understand why do you think Harmony follows the spec
correctly in this case? :)
I am really sorry for my misunderstanding.

You're Dmitry. :-) Now I agree with you that Harmony is not compliant with the specification. We will discuss with our Charset Provider - ICU to determine how to fix this issue. Thanks a lot.

>From a test case attached to the HARMONY-308:

1) We have a char array that has no byte-order mark:
    private static final char chars[] = {

0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
        0x0441,0x0438,0x0438};

2) We have a byte array that encode() should return as we expect.
    private static final byte bytes[] = {
        (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
4,
        (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
50,
        (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
4,
        (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
56};

Please note, according to the spec we expect bytes returned by encode() in
big-endian byte order. So, we expect the FEFF byte-order mark.
Do you agree this expectation is correct and consistent with the spec?

Thanks.
--
Dmitry M. Kononov
Intel Managed Runtime Division



--
Richard Liang
China Software Development Lab, IBM

Reply via email to