[ 
https://issues.apache.org/jira/browse/HDFS-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190773#comment-14190773
 ] 

Colin Patrick McCabe commented on HDFS-7309:
--------------------------------------------

So, the intention when writing {{XMLUtils#mangleXmlString}} was that it would 
handle stuff that the normal XML parser didn't.  Basically XML says that 
there's just no way to have certain code points in your document, so it fails 
to provide a standard way to escape them.  One example is the first few code 
points like code point 0, 1, 2, etc.  There IS a standard way to escape things 
like <, >, %, etc. so we didn't handle those.  {{org.xml.sax.XMLReader}} 
already escapes those code points.

Since you're not using XMLParser, you don't get the benefit of this "built-in" 
escaping.
You could get it manually with this:

{code}
public static string XmlUnescape(string escaped) {
    XmlDocument d = new XmlDocument();
    var node = d.CreateElement("root");
    node.InnerXml = escaped;
    return node.InnerText;
}

public static string XmlEscape(string unescaped){
    XmlDocument d = new XmlDocument();
    var node = d.CreateElement("root");
    node.InnerText = unescaped;
    return node.InnerXml;
}
{code}

Or we could add this functionality to XMLUtils#mangleXmlString.  But we'd have 
to handle all the XML code points that need escaping (I think <, >, &, and 
maybe some of the quote signs).  Also it would need to be optional, to avoid 
double-escaping for callers who are using {{org.xml.sax.XMLReader}}.

> XMLUtils.mangleXmlString doesn't seem to handle less than sign
> --------------------------------------------------------------
>
>                 Key: HDFS-7309
>                 URL: https://issues.apache.org/jira/browse/HDFS-7309
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Ravi Prakash
>            Priority: Minor
>         Attachments: HDFS-7309.patch
>
>
> My expectation was that "<someElement>" + XMLUtils.mangleXmlString(
>       "Containing<ALessThanSign") + "</someElement>" would be a string 
> acceptable to a SAX parser. However this was not true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to