Sergey Bushik created LANG-859:
----------------------------------

             Summary: org.apache.commons.lang.StringEscapeUtils.escapeXml 
doesn't escape chars which are considered invalid according to W3C specification
                 Key: LANG-859
                 URL: https://issues.apache.org/jira/browse/LANG-859
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.*
    Affects Versions: 2.6
            Reporter: Sergey Bushik


According to specification of XML version 1.0 there are Unicode characters that 
are not allowed in the content of the XML document 
http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as 
&#x<hex-code>; or &#<dec-code>;

public static void main(String[] args) throws Exception {
    String xmlValidText = "good";
    // Passes assertion
    assertEquals(StringEscapeUtils.escapeXml("good"), "good");
    
    char xmlInvalidChar = (char) 0x2;
    String xmlInvalidText = String.valueOf(xmlInvalidChar);
    // Fails assertion
    assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "&#x2;");
    
    System.out.println("Is valid: " + 
org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
    String xml =
            "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
            "<chars>" +
            "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
            "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) + 
"</invalid>" +
            "</chars>";
    // An invalid XML character (Unicode: 0x2) was found in the element content 
of the document
    Document document = 
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new 
ByteArrayInputStream(xml.getBytes("UTF-8")));
    System.out.println(document);
}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to