Change bargull-20110327-wHm by bargull@Bargull02 on 2011-03-27 13:48:00
in /home/anba/src/svn/openlaszlo/trunk
for http://svn.openlaszlo.org/openlaszlo/trunk
Summary: Support UTF-16 encoding for files
Bugs Fixed: LPP-1657 (UTF16 data broken)
Technical Reviewer: ptw
QA Reviewer: (pending)
Overview:
UTF-16 encoding support is a bit inconsistent, it isn't supported for
some files (e.g. lzx or proxied xml from server), while for other files
it is (e.g. css files). In addition to that, I've also encountered
"duplicated" code for BOM detection or encoding detection from the
xml-declaration (in "<?xml encoding=... ?>"). "duplicated" is put in
quotes here, because even though the code was used for the same purpose,
it was slightly different leading to other results depending on which
method was called. Here's an example for BOM detection from FileUtils:
FileUtils contained several methods to read the BOM, but all of them
were kind of broken:
- detectBOMEncoding() used mark() to mark the current position in the
stream, but never called reset()
- stripByteOrderMark(PushbackInputStream) mixed up UTF-16LE and
UTF-16BE, and it may unread too much bytes (if the input stream contains
less than three bytes)
- stripByteOrderMark(byte[]) was more or less a plain copy of the other
stripByteOrderMark() method, duplicate code, yuk...
Details:
FileUtils:
#Encoding
-- Simple enum for the encoding information (utf-8 and utf-16 only)
#readString(Reader)
-- New utility method to return the content of a Reader
#getXMLStreamReader(InputStream, String)
-- no longer read the complete InputStream into memory, instead use
mark() to reset the position resp. SequenceInputStream
-- use BOM to deduce proper charset to read xml-prolog, if BOM not
present, guess possible encoding per auto-detection described in xml
specification
#readFully(InputStream, byte[])
-- helper method to read from InputStream into byte array
#skipFully(InputStream, long)
-- helper method to skip() bytes more reliably
#getEncoding(byte[])
-- helper method to return encoding from BOM
#guessXMLEncoding(byte[])
-- helper method to guess possible encoding by reading first bytes of input
#getBOMEncoding(BufferedInputStream)
-- use getEncoding() method to retrieve encoding from BOM
#makeXMLReaderForFile(String, String)
-- use toByteArrayInputStream() to read complete input into memory now
that getXMLStreamReader() no longer uses in-memory streams
#toByteArrayInputStream(InputStream)
-- reads InputStream content into memory and returns
ByteArrayInputStream with the same content
#stripByteOrderMark(PushbackInputStream)
-- use getEncoding() to read BOM
-- don't unread bytes after EOF
#stripByteOrderMark(byte[])
-- use getEncoding() to read BOM
CSSHandler:
- use new methods from FileUtils to avoid duplicating code
HttpData:
- use encoding detection from FileUtils
XMLGrabber:
- the PullParser we're using doesn't perform any encoding detection,
therefore fallback to the utility methods in FileUtils
ResponderXML:
- don't read files unconditionally in a certain encoding (here: UTF-8)
instead just return the plain bytes
FileUtils_Test2:
- test case for the new resp. changed method from FileUtils
WholeFile_Test:
- initialize ReprocessComments.setOptions() in order to run test case
build.xml:
- re-enable FileUtils_Test, according to LPP-2054 it is currently
failing, but I couldn't reproduce the failure
- add FileUtils_Test2
- remove stale EDU/** entry from the oswego concurrent classes
- profiler properties for visual vm, they're not enabled by default but
useful for things like LPP-9841
Tests:
test case from LPP-1657, junit test case
Files:
M WEB-INF/lps/server/build.xml
M WEB-INF/lps/server/src/org/openlaszlo/js2doc/WholeFile_Test.java
M WEB-INF/lps/server/src/org/openlaszlo/css/CSSHandler.java
M WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils.java
A WEB-INF/lps/server/src/org/openlaszlo/utils/FileUtils_Test2.java
M
WEB-INF/lps/server/src/org/openlaszlo/servlets/responders/ResponderXML.java
M WEB-INF/lps/server/src/org/openlaszlo/data/HttpData.java
M WEB-INF/lps/server/src/org/openlaszlo/data/XMLGrabber.java
Changeset:
http://svn.openlaszlo.org/openlaszlo/patches/bargull-20110327-wHm.tar