Re: [text] On the value of idempotent string escape methods?

sebb Tue, 21 Feb 2017 06:56:51 -0800

On 21 February 2017 at 12:40, Rob Tompkins <chtom...@apache.org> wrote:
>
>> On Feb 21, 2017, at 6:02 AM, sebb <seb...@gmail.com> wrote:
>>
>> On 21 February 2017 at 04:40, Sampanna Kahu <sampy...@gmail.com 
>> <mailto:sampy...@gmail.com>> wrote:
>>> Hi Guys,
>>> Very good points are being made above. Please allow me to add my two cents
>>> :-)
>>>
>>> What if the string contains syntactically valid HTML characters/tags and
>>> our aim is to prevent rendering these tags in the browser when this string
>>> is being served via a web application? Or prevent the execution of harmful
>>> embedded scripts when serving it? The 'escapeOnce' method could be useful
>>> here, right?
>>
>> I don't think so.
>>
>>> To explain better, let's consider an example of the specific use-case that
>>> I had in mind when building the 'escapeOnce' method:
>>> Consider the scenario of a simple restful web application where users can
>>> manipulate their text using simple crud operations. Lets assume that we do
>>> not have the 'escapeOnce' method yet.
>>> 1. A user comes and submits his string. We escape it and store it in our
>>> database. If the string had any HTML characters, they would have gotten
>>> escaped.
>>>
>>> 2. After some time, the same user fetches his string, adds some more HTML
>>> characters and submits it. At this point, although the escape method would
>>> correctly escape the freshly added HTML characters, it would escape the
>>> older escaped HTML characters again! (for example &gt; would become
>>> &amp;gt;)
>>> And this effect gets magnified if step number 2 above is repeated.
>>
>> Of course, that is my point.
>>
>> Also remember that you want to show the original string to the user.
>> That's not possible in general if you use this approach.
>>
>> Suppose they originally entered
>>
>> "To code ampersand (&) in HTML, use '&amp;'"
>>
>> Using escapeOnce, this would become:
>>
>> "To code ampersand (&amp;) in HTML, use '&amp;'"
>>
>> You can either show that directly to the user, or use an unescapeOnce
>> and show them:
>>
>> "To code ampersand (&) in HTML, use '&'"
>>
>> Neither makes any sense.
>>
>>> How do we solve the above problem without the 'escapeOnce' method?
>>
>> Store the raw string in the database and escape it just before display.
>>
>> If you are using Javascript, then use an approach such as this to escape it:
>>
>> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str));
>>
>> See:
>>
>> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ 
>> <http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/>
>>
>> This has a good discussion of some of the problems.
>>
>> ==
>>
>> Sorry, but it's not possible in general to do what you want, because
>> one cannot reliably determine if a string has been escaped just from
>> looking at the string.
>
> Another thought occurred to me (again despite potential lack of value).
>
> We should be able to quickly verify if there are any escape strings in the 
> string in question. A single application of unescape followed by checking 
> string equality with the original input would yield a predicate on the 
> existence of escape’s present in the input in question.


Again, what does unescape mean in this context?
Does it ignore incomplete escape sequences, or throw an error?

> From there we could: (1) escape if no escapes were present in the original, 
> or (2) throw an exception if there were escapes present in the original 
> string.
> Again, this feels contrived, so I’m not really suggesting that we add it. I’m 
> just playing with ideas here that could accomplish what Sampanna is going for.

The request is impossible to fulfill reliably, and does not deserve to
be added to a Commons library.

I don't know why this is still being discussed.

> -Rob
>
>>
>> The most one can do is to sanitise the string by escaping anything
>> that is unescaped.
>> However that process corrupts the input - a browser won't display the
>> proper output.
>>
>>> On 20 February 2017 at 21:40, sebb <seb...@gmail.com> wrote:
>>>
>>>> On 20 February 2017 at 15:36, Rob Tompkins <chtom...@apache.org> wrote:
>>>>>
>>>>>> On Feb 20, 2017, at 10:30 AM, sebb <seb...@gmail.com> wrote:
>>>>>>
>>>>>> On 20 February 2017 at 14:55, Rob Tompkins <chtom...@apache.org> wrote:
>>>>>>>
>>>>>>>> On Feb 20, 2017, at 4:31 AM, sebb <seb...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 19 February 2017 at 14:29, Raymond DeCampo <r...@decampo.org
>>>> <mailto:r...@decampo.org>> wrote:
>>>>>>>>> I am trying to see how having the proposed unescape() method leads
>>>> to an a
>>>>>>>>> useful escape method.
>>>>>>>>>
>>>>>>>>> E.g. clearly unescape("&amp;") would evaluate to "&".  So would
>>>>>>>>> unescape("&amp;amp;").  That means the proposed escape() method
>>>> would also
>>>>>>>>> have the same output for "&amp;" and "&amp;amp;".
>>>>>>>>>
>>>>>>>>> I think a better approach for an idempotent escape would be to just
>>>>>>>>> unescape the string once, and then run the traditional escape.
>>>>>>>>
>>>>>>>> That does not eliminate the problems, as you state below.
>>>>>>>>
>>>>>>>>> You will
>>>>>>>>> still have issues if the user intended to escape the string "&amp;"
>>>> but you
>>>>>>>>> are never going to crack that without some kind of state saving.
>>>>>>>>
>>>>>>>> That is my exact point.
>>>>>>>>
>>>>>>>> Since it's not possible for the function to work reliably, we should
>>>>>>>> not mislead users by pretending that there is a magic method that
>>>>>>>> works.
>>>>>>>>
>>>>>>>>> Than given that the functionality is available via to consecutive
>>>> calls to
>>>>>>>>> existing methods, I would probably be disinclined to include it in
>>>> the
>>>>>>>>> library.
>>>>>>>>
>>>>>>>> +1
>>>>>>>
>>>>>>> I’m a (+1) for removal as well.
>>>>>>>
>>>>>>> Also, I didn’t mean for my example to sound like a proposal. I merely
>>>> was trying to get to a potentially valuable stateless idempotent string
>>>> escape function. Its contrivance it quite clear.
>>>>>>>
>>>>>>> Any other comments out there?
>>>>>>>
>>>>>>> We could provide a stateful escaper (that figures out how many escapes
>>>> a string is in), or a method that returns the number of escapes in a string
>>>> is. Again, I’m not all that sure on the value of such methods.
>>>>>>
>>>>>> I don't think it's possible to work out the number of times a string
>>>>>> has been escaped.
>>>>>
>>>>> That may indeed be true, but it is possible to return the number of
>>>> times unescape need be run before subsequent unescapes yield the same
>>>> result.
>>>>
>>>> That in itself is potentially ambiguous.
>>>> Does the unescaper keep going until there are no valid escape
>>>> sequences left, or does it stop when there is a least one ampersand
>>>> which is not part of a valid escape sequence?
>>>>
>>>>> Again, I’m not sure if this is a valuable measure to concern ourselves
>>>> with.
>>>>
>>>> I don't think it provides anything useful.
>>>>
>>>>>>
>>>>>> The most one can do is to determine if a string has not been escaped.
>>>>>> That would be the case where a string has one or more unescaped
>>>>>> characters in it.
>>>>>> For example "This & that" has obviously not been escaped.
>>>>>>
>>>>>> However if a string has no un-escaped characters it it, that does not
>>>>>> necessarily mean that it has already been escaped.
>>>>>> For example: "This &amp; that".
>>>>>> This might have been escaped - or it might not.
>>>>>
>>>>> Ah, I was using the definition of “having been escaped” to be that the
>>>> string contains escape sequences.
>>>>>
>>>>>> For example it could be the answer to: "How does one code 'This &
>>>>>> that' in HTML?”
>>>>>>
>>>>>> The application has to keep track of the escape-state of the string.
>>>>>
>>>>> Definitely agreed with your definition of “having been escaped."
>>>>>
>>>>>>
>>>>>>> Cheers,
>>>>>>> -Rob
>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins <chtom...@gmail.com>
>>>> wrote:
>>>>>>>>>
>>>>>>>>>> In preparation for the 1.0 release, I think we should address Sebb's
>>>>>>>>>> concern in TEXT-40 about the attempt to create "idempotent" string
>>>> escape
>>>>>>>>>> methods. By idempotent I mean someMethod("some string") =
>>>>>>>>>> someMethod(someMethod(someMethod(...someMethod("some string")))), a
>>>>>>>>>> single application of a method is equal to any number of the
>>>> applications
>>>>>>>>>> of the method on the same input.
>>>>>>>>>>
>>>>>>>>>> Below I lay out a mechanism by which it is possible to write such
>>>> methods,
>>>>>>>>>> but I don’t know the value in writing such methods. I'm merely
>>>> expressing
>>>>>>>>>> that idempotency is a possibility.
>>>>>>>>>>
>>>>>>>>>> For string "un-escaping", I believe that we can write a method that,
>>>>>>>>>> indeed, is idempotent by simply running the un-escape method the
>>>> finite
>>>>>>>>>> number of un-escapings to get to the point at which the string
>>>> remains
>>>>>>>>>> unchanged between applications of the un-escaping method. (I
>>>> believe that I
>>>>>>>>>> can write a proof that all un-escape methods have such a point, if
>>>> that is
>>>>>>>>>> needed for the sake of discussion).
>>>>>>>>>>
>>>>>>>>>> If indeed we can create an idempotent un-escape method, then we can
>>>> simply
>>>>>>>>>> take that method run it, and then run the escaping method one time.
>>>> If we
>>>>>>>>>> always completely unescape and then escape once then we do have an
>>>>>>>>>> idempotent method.
>>>>>>>>>>
>>>>>>>>>> Such a method might not be all that valuable to the user though.
>>>>>>>>>> Furthermore, this just explains one way to create such an idempotent
>>>>>>>>>> method. Whether or not more or more valuable methods exists, would
>>>> take
>>>>>>>>>> some more though.
>>>>>>>>>>
>>>>>>>>>> Anyone have any thoughts? My feeling is that it might be more
>>>> effort than
>>>>>>>>>> it's worth to ensure that any string is only "singly encoded.”
>>>> Further, we
>>>>>>>>>> probably should give a look at the “escape_once” methods in
>>>>>>>>>> StringEsapeUtils.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>> -Rob
>>>>>>>>>> ------------------------------------------------------------
>>>> ---------
>>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org <mailto:
>>>> dev-unsubscr...@commons.apache.org>
>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org <mailto:
>>>> dev-h...@commons.apache.org>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>>>> For additional commands, e-mail: dev-h...@commons.apache.org
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org 
>> <mailto:dev-unsubscr...@commons.apache.org>
>> For additional commands, e-mail: dev-h...@commons.apache.org 
>> <mailto:dev-h...@commons.apache.org>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [text] On the value of idempotent string escape methods?

Reply via email to