Dmitry M. Kononov wrote:
Hi Richard,
On 4/6/06, Richard Liang <[EMAIL PROTECTED]> wrote:
Dmitry M. Kononov wrote:
As you exactly noticed the cause of this issue that Harmony uses the
little-endian byte order, if an encoded UTF-16 sequence has no
byte-order
mark. However, the spec reads such a case explicitly as follows:
"When decoding, the UTF-16 charset interprets a byte-order mark to
indicate
the byte order of the stream but defaults to big-endian if there is no
byte-order mark; when encoding, it uses big-endian byte order and writes
a
big-endian byte-order mark."
Hello Dmitry,
Yes, although Harmony and RI use different byte order, as both Harmony
and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
compliant with the specification. So could we regard Harmony-308 as "not
a bug"?
I think Harmony's behavior in this case is inconsistent with the java spec,
since the spec defines the expected behavior explicitly:
"when encoding, it uses big-endian byte order and writes a big-endian
byte-order mark." But Harmony's encode() returns bytes in the little-endian
order.
It seems I do not understand why do you think Harmony follows the spec
correctly in this case? :)
I am really sorry for my misunderstanding.
You're Dmitry. :-) Now I agree with you that Harmony is not compliant
with the specification. We will discuss with our Charset Provider - ICU
to determine how to fix this issue. Thanks a lot.
>From a test case attached to the HARMONY-308:
1) We have a char array that has no byte-order mark:
private static final char chars[] = {
0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
0x0441,0x0438,0x0438};
2) We have a byte array that encode() should return as we expect.
private static final byte bytes[] = {
(byte)254,(byte)255,(byte) 4,(byte) 27,(byte) 4,(byte) 53,(byte)
4,
(byte) 66,(byte) 4,(byte) 62,(byte) 0,(byte) 32,(byte) 4,(byte)
50,
(byte) 0,(byte) 32,(byte) 4,(byte) 32,(byte) 4,(byte) 62,(byte)
4,
(byte) 65,(byte) 4,(byte) 65,(byte) 4,(byte) 56,(byte) 4,(byte)
56};
Please note, according to the spec we expect bytes returned by encode() in
big-endian byte order. So, we expect the FEFF byte-order mark.
Do you agree this expectation is correct and consistent with the spec?
Thanks.
--
Dmitry M. Kononov
Intel Managed Runtime Division
--
Richard Liang
China Software Development Lab, IBM