XmlPullParser parses strings with platform's default charset

Martin Grigorov Mon, 04 Jun 2012 06:38:22 -0700

Hi,

I'm not quite sure but I think there is a bug in
org.apache.wicket.markup.parser.XmlPullParser#parse(CharSequence)
because it uses
string.toString().getBytes() to create a ByteArrayInputStream.


org.apache.wicket.util.tester.BaseWicketTester#getTagById(String) uses
lastResponseAsString to feed XmlPullParser but lastResponseAsString's
encoding depends on
org.apache.wicket.settings.IRequestCycleSettings#getResponseRequestEncoding().
I.e. the string may be encoded in UTF-8 but later XmlPullParser will
try to process its bytes as Windows-1252 for example.


Here is a small patch that exposes the problem:
diff --git 
a/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java
b/wicket-core/src/test/java/org/apache/wicket/markup/p
index 2e26d05..15fb496 100644
--- 
a/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java
+++ 
b/wicket-core/src/test/java/org/apache/wicket/markup/parser/XmlPullParserTest.java
@@ -191,6 +191,13 @@ public class XmlPullParserTest extends Assert
                assertNull(parser.getEncoding());
                tag = parser.nextTag();
                assertNull(tag);
+
+               String expected = "äöü€";
+               parser.parse("<dummy>"+expected+"</dummy>");
+               XmlTag openTag = parser.nextTag();
+               XmlTag closeTag = parser.nextTag();
+               String actual = parser.getInput(openTag.getPos() +
openTag.getLength(), closeTag.getPos()).toString();
+               assertEquals(expected, actual);
        }

        /**

Apply this patch and run the test with -Dfile.encoding=latin1. It will
fail in the comparison. Run it with UTF-8 and it will pass.

I remember Juergen had similar problem with one of Wicket core tests
that uses the Euro sign in an assertion and he fixed it by using
unicode escaped value (\uabcd).
But in this case the response is encoded with whatever is configured
at IRequestCycleSettings#getResponseRequestEncoding() and
XmlPullParser tries to read it with the platform default encoding.

Is this a bug and how we can solve it ?

-- 
Martin Grigorov
jWeekend
Training, Consulting, Development
http://jWeekend.com

XmlPullParser parses strings with platform's default charset

Reply via email to