[
https://issues.apache.org/jira/browse/LANG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebb updated LANG-859:
----------------------
Description:
According to specification of XML version 1.0 there are Unicode characters that
are not allowed in the content of the XML document
http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as
&#x<hex-code>; or &#<dec-code>;
{code}
public static void main(String[] args) throws Exception {
String xmlValidText = "good";
// Passes assertion
assertEquals(StringEscapeUtils.escapeXml("good"), "good");
char xmlInvalidChar = (char) 0x2;
String xmlInvalidText = String.valueOf(xmlInvalidChar);
// Fails assertion
assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "");
System.out.println("Is valid: " +
org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<chars>" +
"<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
"<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) +
"</invalid>" +
"</chars>";
// An invalid XML character (Unicode: 0x2) was found in the element content
of the document
Document document =
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
ByteArrayInputStream(xml.getBytes("UTF-8")));
System.out.println(document);
}
{code}
was:
According to specification of XML version 1.0 there are Unicode characters that
are not allowed in the content of the XML document
http://www.w3.org/TR/xml/#charsets
StringEscapeUtils.escapeXml(value) should escape such characters as
&#x<hex-code>; or &#<dec-code>;
public static void main(String[] args) throws Exception {
String xmlValidText = "good";
// Passes assertion
assertEquals(StringEscapeUtils.escapeXml("good"), "good");
char xmlInvalidChar = (char) 0x2;
String xmlInvalidText = String.valueOf(xmlInvalidChar);
// Fails assertion
assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "");
System.out.println("Is valid: " +
org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
"<chars>" +
"<valid>" + StringEscapeUtils.escapeXml(xmlValidText) + "</valid>" +
"<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) +
"</invalid>" +
"</chars>";
// An invalid XML character (Unicode: 0x2) was found in the element content
of the document
Document document =
DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
ByteArrayInputStream(xml.getBytes("UTF-8")));
System.out.println(document);
}
> org.apache.commons.lang.StringEscapeUtils.escapeXml doesn't escape chars
> which are considered invalid according to W3C specification
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LANG-859
> URL: https://issues.apache.org/jira/browse/LANG-859
> Project: Commons Lang
> Issue Type: Bug
> Components: lang.*
> Affects Versions: 2.6
> Reporter: Sergey Bushik
>
> According to specification of XML version 1.0 there are Unicode characters
> that are not allowed in the content of the XML document
> http://www.w3.org/TR/xml/#charsets
> StringEscapeUtils.escapeXml(value) should escape such characters as
> &#x<hex-code>; or &#<dec-code>;
> {code}
> public static void main(String[] args) throws Exception {
> String xmlValidText = "good";
> // Passes assertion
> assertEquals(StringEscapeUtils.escapeXml("good"), "good");
>
> char xmlInvalidChar = (char) 0x2;
> String xmlInvalidText = String.valueOf(xmlInvalidChar);
> // Fails assertion
> assertEquals(StringEscapeUtils.escapeXml(xmlInvalidText), "");
>
> System.out.println("Is valid: " +
> org.apache.xerces.util.XMLChar.isInvalid(xmlInvalidChar));
> String xml =
> "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
> "<chars>" +
> "<valid>" + StringEscapeUtils.escapeXml(xmlValidText) +
> "</valid>" +
> "<invalid>" + StringEscapeUtils.escapeXml(xmlInvalidText) +
> "</invalid>" +
> "</chars>";
> // An invalid XML character (Unicode: 0x2) was found in the element
> content of the document
> Document document =
> DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new
> ByteArrayInputStream(xml.getBytes("UTF-8")));
> System.out.println(document);
> }
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira