Re: RFR: 8043592: The basic XML parser based on UKit fails to read XML files encoded in UTF-16BE or LE

Xueming Shen Thu, 22 May 2014 10:39:01 -0700

Hi

(1) Do we really need those shift at line ln#2989/90 and 2994/95? it appears to 
me
     those bytes have been decided to be ZERO already, we are talking about
     mChar[0] = '<' and mChar[1] = '?' here, right?


(2) for test, maybe we should just do p.loadFromXML(in) ? that path should 
verify the
     fix as well (the real use scenario), right?

(3) do we have tests for utf16 bom? if not, I would suggest to throw in 
UTF-16BE/LE-BOM
     into the charset[], just in case.

thanks!
-Sherman

On 05/22/2014 09:30 AM, huizhe wang wrote:

Refer to 8042889, while verifying/testing 8042889, we noticed that the tiny XML 
parser failed on UTF-16BE or LE. The cause of the failure was that the parser 
was actually implemented to abide by the XML specification that required 
entities encoded in UTF-16 to begin with BOM. The test we used sent a byte 
array to the parser without BOM, thus failed.

Since it's not uncommon for a XML to not have BOM, I borrowed the technique 
used in Xerces to add an additional check for UTF-16 encoding.  Please review.

http://cr.openjdk.java.net/~joehw/jdk9/8043592/webrev/

Thanks,
Joe

Re: RFR: 8043592: The basic XML parser based on UKit fails to read XML files encoded in UTF-16BE or LE

Reply via email to