[
https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929
]
Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:53 PM:
--------------------------------------------------------------
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.
* Outputting a standalone javascript file containing string literals. The
generation of string literals to be surrounded by double or single quotes seems
to be covered by the existing code in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string
literals. This needs a new method *escapeHtmlAttr*. Depending on the
surrounding quotes or absence of them, all characters of the attribute value
will go through either a minimal substitution of [single/double quotes and
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace,
ampersand, single/double quotes, equals, greater/less-than and
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq +
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. Because browsers allow
readable javascript between the script tags, browsers [do not apply a straight
decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes. The code of escapeEcmaScript omitting the
ampersand character from escaping follows this rule and therefore avoids
redundant escaping.
Another decoding still applies and the escaping code appears vulnerable to it.
According to the WHATWG HTML parsing rules, the end script tag </script> will
disrupt javascript parsing in any state. Changing escapeEcmaScript() to
*escape the less-than character* (with either the backslash-x notation or with
a simple backslash prefix) will prevent from *XSS attacks injecting the end
script tag* </script>. Escaping the greater-than character does not seem
necessary but would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq +
")</script>");
{code}
was (Author: ilatypov):
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.
* Outputting a standalone javascript file containing string literals. The
generation of string literals to be surrounded by double or single quotes seems
to be covered by the existing code in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string
literals. This needs a new method *escapeHtmlAttr*. Depending on the
surrounding quotes or absence of them, all characters of the attribute value
will go through either a minimal substitution of [single/double quotes and
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace,
ampersand, single/double quotes, equals, greater/less-than and
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq +
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. Because browsers allow
readable javascript between the script tags, browsers [stopped applying a
straight decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes. The code of escapeEcmaScript omitting the
ampersand character from escaping avoids corruption of string literals that
would otherwise be introduced by escaping against non-existent decoding.
Another decoding still applies and the escaping code appears vulnerable to it.
According to the WHATWG HTML parsing rules, the end script tag </script> will
disrupt javascript parsing in any state. Changing escapeEcmaScript() to
*escape the less-than character* (with either the backslash-x notation or with
a simple backslash prefix) will prevent from *XSS attacks injecting the end
script tag* </script>. Escaping the greater-than character does not seem
necessary but would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq +
")</script>");
{code}
> [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
> ------------------------------------------------------------------
>
> Key: TEXT-42
> URL: https://issues.apache.org/jira/browse/TEXT-42
> Project: Commons Text
> Issue Type: Bug
> Reporter: Andy Reek
> Labels: XSS
> Fix For: 1.x
>
>
> org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape
> via a prefixed '\' on all characters which must be escaped. I am not sure if
> this is really secure, if am looking at the comments on
> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values.
> They say it is possible to do an attack by escape the escape. I tested this
> with the string '\"' and the output was '\\\"'. Is this really
> ecma-/java-script secure? Or is it better to use the implementation used by
> OWASP?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)