[ 
https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171218#comment-14171218
 ] 

Robert Sussland commented on LANG-1042:
---------------------------------------

Is there any chance we could deprecate this method? Regardless of what the 
original author was intending, they have now permeated the open source 
eco-system:

a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the 
results, users are expecting html string escaping in both attribute values and 
html nodes, with over 14K hits in JSPs where they are manually using this 
function to perform security encoding.

https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults

What is saving the majority of these users is the (Java) convention that html 
attributes be double rather than single quoted. However, in Rails, attributes 
are single quoted by convention, and in JS-land you see a mix. I.e. we cannot 
rely on users being accidentally safe when using this function.

This function is also used to encode attribute values in Apache Struts. Spring 
MVC is not importing the package but effectively copied the code in 
HtmlUtils.java. Then Grails copied Spring's encoder (c.f. package 
org.grails.encoder.impl;) which was also picked up by Wicket 
(wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java). 

Point being, you have the vast majority of Java MVC frameworks now improperly 
escaping html attributes and crucially relying on their double quote 
conventions for security. This is a concern when they take these same libraries 
and use them for encoding data sent to the client, or perform manual encoding 
in the controller with the unsafe escapeHTML function.  All because of this 
escapeHtml function.

Here is the Play project, which for the most part uses the StringEscapeUtils 
methods.

{code}
/api/src/main/scala/play/twirl/api/Formats.scala

  def escape(text: String): Html = {
    // Using our own algorithm here because commons lang escaping wasn't 
designed for protecting against XSS, and there
    // don't seem to be any other good generic escaping tools out there.
    val sb = new StringBuilder(text.length)
    text.foreach {
      case '<' => sb.append("&lt;")
      case '>' => sb.append("&gt;")
      case '"' => sb.append("&quot;")
      case '\'' => sb.append("&#x27;")
      case '&' => sb.append("&amp;")
      case c => sb += c
    }
    new Html(sb.toString)
  }
{code}

It would be nice if, instead of each developer needing to roll their own 
escaping library, they could rely on a common string escaping library. The 
StringEscapeUtils does this except for the html method -- what about 
deprecating this method and introducing a new one -- secureHtmlEscape --  that 
escapes <, >, ', ", and &?


> StringEscapeUtils.escapeHtml() does not escape single quote
> -----------------------------------------------------------
>
>                 Key: LANG-1042
>                 URL: https://issues.apache.org/jira/browse/LANG-1042
>             Project: Commons Lang
>          Issue Type: Bug
>            Reporter: Robert Sussland
>            Priority: Critical
>
> The String Escape Utils should ensure that encoded data cannot escape from a 
> string. However in HTML (starting with 1.0 and until the present), attribute 
> values may be denoted by either single or double quotes. Therefore single 
> quotes need to be escaped just as much as double quotes. 
> From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
> {quote}
> By default, SGML requires that all attribute values be delimited using either 
> double quotation marks (ASCII decimal 34) or single quotation marks (ASCII 
> decimal 39). Single quote marks can be included within the attribute value 
> when the value is delimited by double quote marks, and vice versa. Authors 
> may also use numeric character references to represent double quotes 
> (&amp;#34\;) and single quotes (&amp;#39\;). For double quotes authors can 
> also use the character entity reference &amp;quot;.
> {quote}
> Note that there have been several bugs in the wild in which string encoders 
> use this library under the hood, and as a result fail to properly escape html 
> attributes in which user input is stored:
> <div title='<%=user_data%>'>Howdy</div>
> if user_data = ' onclick='payload' ' 
> then an attacker can inject their code into the page even if the developer is 
> using the string escape utils to escape the user string.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to