[
https://issues.apache.org/jira/browse/TEXT-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16223929#comment-16223929
]
Ilguiz Latypov edited comment on TEXT-42 at 10/29/17 3:29 PM:
--------------------------------------------------------------
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.
* Outputting a standalone javascript file containing string literals. The
generation of string literals to be surrounded by double or single quotes seems
to be covered by the existing code in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string
literals. This needs a new method *escapeHtmlAttr*. Depending on the
surrounding quotes or absence of them, all characters of the attribute value
will go through either a minimal substitution of [single/double quotes and
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace,
ampersand, single/double quotes, equals, greater/less-than and
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq +
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code
*lacks protection* against the script's end tag taking precedence over any
contents. Because browsers allow readable javascript between the script tags,
browsers [stopped applying a straight decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes. The code in escapeEcmaScript() *must
escape the less-than character* (with either the backslash-x notation or with a
simple backslash prefix). Assuming that browsers may keep applying their HTML
entity decoding throughout the script tag contents, encoding ampersands with
the backslash-x notation or single backslash seems necessary. Escaping the
greater-than character does not seem necessary but would look symmetrical to
escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq +
")</script>");
{code}
was (Author: ilatypov):
I wonder if the escapeEcmaScript()'s use cases can be scrutinized.
* Outputting a standalone javascript file containing string literals. The
generation of string literals to be surrounded by double or single quotes seems
to be covered by the existing code in escapeEcmaScript().
{code:java}
String dq = Character.toString('"');
out.println("alert(" + dq + escapeEcmaScript(input) + dq + ");");
{code}
* Outputting an HTML attribute containing javascript containing string
literals. This needs a new method *escapeHtmlAttr*. Depending on the
surrounding quotes or absence of them, all characters of the attribute value
will go through either a minimal substitution of [single/double quotes and
ampersand|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(double-quoted)-state]
with the HTML entity or through a broader replacement of [whitespace,
ampersand, single/double quotes, equals, greater/less-than and
backquotes|https://html.spec.whatwg.org/multipage/parsing.html#attribute-value-(unquoted)-state].
Safety calls to use the broader escaping by default (and allow the narrow one
as an option). I.e.
{code:java}
out.println("onmouseover=" + dq + escapeHtmlAttr("alert(" + dq +
escapeEcmaScript(input) + dq + ")") + dq);
{code}
* Outputting string literals in the script tag contents. The existing code
*lacks protection* against the script's end tag taking precedence over any
contents. Because browsers allow readable javascript between the script tags,
browsers [stopped applying a straight decoding
algorithm|https://stackoverflow.com/questions/41297404/is-it-possible-to-correctly-escape-arbitrary-script-tag-contents]
similar to one in HTML attributes. The code in escapeEcmaScript() *must
escape the less-than character* (with either the backslash-x notation or with a
simple backslash prefix). I suggest to escape ampersands (assuming that
browsers may keep applying their HTML entity decoding throughout the script tag
contents). Escaping the greater-than character does not seem necessary but
would look symmetrical to escaping the less-than character.
{code:java}
out.println("<script>alert(" + dq + escapeEcmaScript(input) + dq +
")</script>");
{code}
> [XSS] Possible attacks through StringEscapeUtils.escapeEcmaScript?
> ------------------------------------------------------------------
>
> Key: TEXT-42
> URL: https://issues.apache.org/jira/browse/TEXT-42
> Project: Commons Text
> Issue Type: Bug
> Reporter: Andy Reek
> Labels: XSS
> Fix For: 1.x
>
>
> org.apache.commons.lang3.StringEscapeUtils.escapeEcmaScript does the escape
> via a prefixed '\' on all characters which must be escaped. I am not sure if
> this is really secure, if am looking at the comments on
> https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet#RULE_.233_-_JavaScript_Escape_Before_Inserting_Untrusted_Data_into_JavaScript_Data_Values.
> They say it is possible to do an attack by escape the escape. I tested this
> with the string '\"' and the output was '\\\"'. Is this really
> ecma-/java-script secure? Or is it better to use the implementation used by
> OWASP?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)