[
https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171218#comment-14171218
]
Robert Sussland edited comment on LANG-1042 at 10/14/14 5:07 PM:
-----------------------------------------------------------------
Is there any chance we could deprecate this method? Regardless of what the
original author was intending, they have now permeated the open source
eco-system:
a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the
results, users are expecting html string escaping in both attribute values and
html nodes, with over 14K hits in JSPs where they are manually using this
function to perform security encoding.
https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults
What is saving the majority of these users is the (Java) convention that html
attributes be double rather than single quoted. However, in Rails, attributes
are single quoted by convention, and in JS-land you see a mix. I.e. we cannot
rely on users being accidentally safe when using this function.
This function is also used to encode attribute values in Apache Struts. Spring
MVC is not importing the package but effectively copied the code in
HtmlUtils.java.
Then Grails copied Spring's encoder (c.f. package org.grails.encoder.impl)
which was also picked up by Wicket
(wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java).
Point being, you have the vast majority of Java MVC frameworks now improperly
escaping html attributes and crucially relying on their double quote
conventions for security. This is a concern when they take these same libraries
re-use them.
Here is the Play project, which for the most part uses the StringEscapeUtils
methods, except for HTML encoding.
{code}
/api/src/main/scala/play/twirl/api/Formats.scala
def escape(text: String): Html = {
// Using our own algorithm here because commons lang escaping wasn't
designed for protecting against XSS, and there
// don't seem to be any other good generic escaping tools out there.
val sb = new StringBuilder(text.length)
text.foreach {
case '<' => sb.append("<")
case '>' => sb.append(">")
case '"' => sb.append(""")
case '\'' => sb.append("'")
case '&' => sb.append("&")
case c => sb += c
}
new Html(sb.toString)
}
{code}
It would be nice if, instead of each developer needing to roll their own
escaping library, they could rely on a common string escaping library. The
StringEscapeUtils does this except for the html method -- what about
deprecating this method and introducing a new one -- secureHtmlEscape -- that
escapes <, >, ', ", and &?
I understand that encoding is context sensitive. Generally you need to perform
a sequence of encodings to match the sequence of browser parsings, but the
sequence is built from HTML, JS, CSS, and URI encoding operations, so there is
a need for a common definition of these. Then frameworks and toolkits can
perform the proper contextual encoding using these functions.
was (Author: rsussland):
Is there any chance we could deprecate this method? Regardless of what the
original author was intending, they have now permeated the open source
eco-system:
a github search shows 25K hits for StringEscapeUtils.escapeHtml. Looking at the
results, users are expecting html string escaping in both attribute values and
html nodes, with over 14K hits in JSPs where they are manually using this
function to perform security encoding.
https://github.com/search?utf8=%E2%9C%93&q=StringEscapeUtils.escapeHtml&type=Code&ref=searchresults
What is saving the majority of these users is the (Java) convention that html
attributes be double rather than single quoted. However, in Rails, attributes
are single quoted by convention, and in JS-land you see a mix. I.e. we cannot
rely on users being accidentally safe when using this function.
This function is also used to encode attribute values in Apache Struts. Spring
MVC is not importing the package but effectively copied the code in
HtmlUtils.java. Then Grails copied Spring's encoder (c.f. package
org.grails.encoder.impl;) which was also picked up by Wicket
(wicket-util/src/main/java/org/apache/wicket/util/string/Entities.java).
Point being, you have the vast majority of Java MVC frameworks now improperly
escaping html attributes and crucially relying on their double quote
conventions for security. This is a concern when they take these same libraries
and use them for encoding data sent to the client, or perform manual encoding
in the controller with the unsafe escapeHTML function. All because of this
escapeHtml function.
Here is the Play project, which for the most part uses the StringEscapeUtils
methods.
{code}
/api/src/main/scala/play/twirl/api/Formats.scala
def escape(text: String): Html = {
// Using our own algorithm here because commons lang escaping wasn't
designed for protecting against XSS, and there
// don't seem to be any other good generic escaping tools out there.
val sb = new StringBuilder(text.length)
text.foreach {
case '<' => sb.append("<")
case '>' => sb.append(">")
case '"' => sb.append(""")
case '\'' => sb.append("'")
case '&' => sb.append("&")
case c => sb += c
}
new Html(sb.toString)
}
{code}
It would be nice if, instead of each developer needing to roll their own
escaping library, they could rely on a common string escaping library. The
StringEscapeUtils does this except for the html method -- what about
deprecating this method and introducing a new one -- secureHtmlEscape -- that
escapes <, >, ', ", and &?
> StringEscapeUtils.escapeHtml() does not escape single quote
> -----------------------------------------------------------
>
> Key: LANG-1042
> URL: https://issues.apache.org/jira/browse/LANG-1042
> Project: Commons Lang
> Issue Type: Bug
> Reporter: Robert Sussland
> Priority: Critical
>
> The String Escape Utils should ensure that encoded data cannot escape from a
> string. However in HTML (starting with 1.0 and until the present), attribute
> values may be denoted by either single or double quotes. Therefore single
> quotes need to be escaped just as much as double quotes.
> From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
> {quote}
> By default, SGML requires that all attribute values be delimited using either
> double quotation marks (ASCII decimal 34) or single quotation marks (ASCII
> decimal 39). Single quote marks can be included within the attribute value
> when the value is delimited by double quote marks, and vice versa. Authors
> may also use numeric character references to represent double quotes
> (&#34\;) and single quotes (&#39\;). For double quotes authors can
> also use the character entity reference &quot;.
> {quote}
> Note that there have been several bugs in the wild in which string encoders
> use this library under the hood, and as a result fail to properly escape html
> attributes in which user input is stored:
> <div title='<%=user_data%>'>Howdy</div>
> if user_data = ' onclick='payload' '
> then an attacker can inject their code into the page even if the developer is
> using the string escape utils to escape the user string.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)