[
https://issues.apache.org/jira/browse/LANG-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165055#comment-14165055
]
Duncan Jones commented on LANG-1042:
------------------------------------
bq. String escaping has a well-defined meaning – the output of this function
should not be able break out of a string data context, because all characters
that could be interpreted by the html parser as closing out the string data
context are escaped.
I don't think there's anything in the documentation that suggests the escaped
data will be safe in a string data context. In fact, not describing the context
is one of the many flaws in the current docs.
Because the current documentation is so woefully inadequate, I don't think we
can't change any behaviour here under the rationale of "it's a bug". Therefore,
I see we have two things to do:
- Have a good think about the current functionality, then document it better
so that people truly understand what it does and in which contexts it is useful
(if any).
- Decide if we should offer other escape methods that work in a wider range of
contexts (including attribute values). If our goal with these methods is to
prevent XSS attacks (amongst other things), then this should be stated clearly
in any resulting method documentation.
I don't entirely understand why these escaping methods are in Lang anyway, so
I'm not in favour of extending them further. But that's just my 2c. I'm
certainly in favour of extending the current Javadoc to ensure future users
don't mistake what these methods are doing.
> StringEscapeUtils.escapeHtml() does not escape single quote
> -----------------------------------------------------------
>
> Key: LANG-1042
> URL: https://issues.apache.org/jira/browse/LANG-1042
> Project: Commons Lang
> Issue Type: Bug
> Reporter: Robert Sussland
> Priority: Critical
>
> The String Escape Utils should ensure that encoded data cannot escape from a
> string. However in HTML (starting with 1.0 and until the present), attribute
> values may be denoted by either single or double quotes. Therefore single
> quotes need to be escaped just as much as double quotes.
> From the standard: http://www.w3.org/TR/html4/intro/sgmltut.html#h-3.2.2
> {quote}
> By default, SGML requires that all attribute values be delimited using either
> double quotation marks (ASCII decimal 34) or single quotation marks (ASCII
> decimal 39). Single quote marks can be included within the attribute value
> when the value is delimited by double quote marks, and vice versa. Authors
> may also use numeric character references to represent double quotes
> (&#34\;) and single quotes (&#39\;). For double quotes authors can
> also use the character entity reference ".
> {quote}
> Note that there have been several bugs in the wild in which string encoders
> use this library under the hood, and as a result fail to properly escape html
> attributes in which user input is stored:
> <div title='<%=user_data%>'>Howdy</div>
> if user_data = ' onclick='payload' '
> then an attacker can inject their code into the page even if the developer is
> using the string escape utils to escape the user string.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)