On 21 February 2017 at 12:40, Rob Tompkins <chtom...@apache.org> wrote: > >> On Feb 21, 2017, at 6:02 AM, sebb <seb...@gmail.com> wrote: >> >> On 21 February 2017 at 04:40, Sampanna Kahu <sampy...@gmail.com >> <mailto:sampy...@gmail.com>> wrote: >>> Hi Guys, >>> Very good points are being made above. Please allow me to add my two cents >>> :-) >>> >>> What if the string contains syntactically valid HTML characters/tags and >>> our aim is to prevent rendering these tags in the browser when this string >>> is being served via a web application? Or prevent the execution of harmful >>> embedded scripts when serving it? The 'escapeOnce' method could be useful >>> here, right? >> >> I don't think so. >> >>> To explain better, let's consider an example of the specific use-case that >>> I had in mind when building the 'escapeOnce' method: >>> Consider the scenario of a simple restful web application where users can >>> manipulate their text using simple crud operations. Lets assume that we do >>> not have the 'escapeOnce' method yet. >>> 1. A user comes and submits his string. We escape it and store it in our >>> database. If the string had any HTML characters, they would have gotten >>> escaped. >>> >>> 2. After some time, the same user fetches his string, adds some more HTML >>> characters and submits it. At this point, although the escape method would >>> correctly escape the freshly added HTML characters, it would escape the >>> older escaped HTML characters again! (for example > would become >>> &gt;) >>> And this effect gets magnified if step number 2 above is repeated. >> >> Of course, that is my point. >> >> Also remember that you want to show the original string to the user. >> That's not possible in general if you use this approach. >> >> Suppose they originally entered >> >> "To code ampersand (&) in HTML, use '&'" >> >> Using escapeOnce, this would become: >> >> "To code ampersand (&) in HTML, use '&'" >> >> You can either show that directly to the user, or use an unescapeOnce >> and show them: >> >> "To code ampersand (&) in HTML, use '&'" >> >> Neither makes any sense. >> >>> How do we solve the above problem without the 'escapeOnce' method? >> >> Store the raw string in the database and escape it just before display. >> >> If you are using Javascript, then use an approach such as this to escape it: >> >> document.getElementById("whereItGoes").appendChild(document.createTextNode(unsafe_str)); >> >> See: >> >> http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/ >> <http://shebang.brandonmintern.com/foolproof-html-escaping-in-javascript/> >> >> This has a good discussion of some of the problems. >> >> == >> >> Sorry, but it's not possible in general to do what you want, because >> one cannot reliably determine if a string has been escaped just from >> looking at the string. > > Another thought occurred to me (again despite potential lack of value). > > We should be able to quickly verify if there are any escape strings in the > string in question. A single application of unescape followed by checking > string equality with the original input would yield a predicate on the > existence of escape’s present in the input in question.
Again, what does unescape mean in this context? Does it ignore incomplete escape sequences, or throw an error? > From there we could: (1) escape if no escapes were present in the original, > or (2) throw an exception if there were escapes present in the original > string. > Again, this feels contrived, so I’m not really suggesting that we add it. I’m > just playing with ideas here that could accomplish what Sampanna is going for. The request is impossible to fulfill reliably, and does not deserve to be added to a Commons library. I don't know why this is still being discussed. > -Rob > >> >> The most one can do is to sanitise the string by escaping anything >> that is unescaped. >> However that process corrupts the input - a browser won't display the >> proper output. >> >>> On 20 February 2017 at 21:40, sebb <seb...@gmail.com> wrote: >>> >>>> On 20 February 2017 at 15:36, Rob Tompkins <chtom...@apache.org> wrote: >>>>> >>>>>> On Feb 20, 2017, at 10:30 AM, sebb <seb...@gmail.com> wrote: >>>>>> >>>>>> On 20 February 2017 at 14:55, Rob Tompkins <chtom...@apache.org> wrote: >>>>>>> >>>>>>>> On Feb 20, 2017, at 4:31 AM, sebb <seb...@gmail.com> wrote: >>>>>>>> >>>>>>>> On 19 February 2017 at 14:29, Raymond DeCampo <r...@decampo.org >>>> <mailto:r...@decampo.org>> wrote: >>>>>>>>> I am trying to see how having the proposed unescape() method leads >>>> to an a >>>>>>>>> useful escape method. >>>>>>>>> >>>>>>>>> E.g. clearly unescape("&") would evaluate to "&". So would >>>>>>>>> unescape("&amp;"). That means the proposed escape() method >>>> would also >>>>>>>>> have the same output for "&" and "&amp;". >>>>>>>>> >>>>>>>>> I think a better approach for an idempotent escape would be to just >>>>>>>>> unescape the string once, and then run the traditional escape. >>>>>>>> >>>>>>>> That does not eliminate the problems, as you state below. >>>>>>>> >>>>>>>>> You will >>>>>>>>> still have issues if the user intended to escape the string "&" >>>> but you >>>>>>>>> are never going to crack that without some kind of state saving. >>>>>>>> >>>>>>>> That is my exact point. >>>>>>>> >>>>>>>> Since it's not possible for the function to work reliably, we should >>>>>>>> not mislead users by pretending that there is a magic method that >>>>>>>> works. >>>>>>>> >>>>>>>>> Than given that the functionality is available via to consecutive >>>> calls to >>>>>>>>> existing methods, I would probably be disinclined to include it in >>>> the >>>>>>>>> library. >>>>>>>> >>>>>>>> +1 >>>>>>> >>>>>>> I’m a (+1) for removal as well. >>>>>>> >>>>>>> Also, I didn’t mean for my example to sound like a proposal. I merely >>>> was trying to get to a potentially valuable stateless idempotent string >>>> escape function. Its contrivance it quite clear. >>>>>>> >>>>>>> Any other comments out there? >>>>>>> >>>>>>> We could provide a stateful escaper (that figures out how many escapes >>>> a string is in), or a method that returns the number of escapes in a string >>>> is. Again, I’m not all that sure on the value of such methods. >>>>>> >>>>>> I don't think it's possible to work out the number of times a string >>>>>> has been escaped. >>>>> >>>>> That may indeed be true, but it is possible to return the number of >>>> times unescape need be run before subsequent unescapes yield the same >>>> result. >>>> >>>> That in itself is potentially ambiguous. >>>> Does the unescaper keep going until there are no valid escape >>>> sequences left, or does it stop when there is a least one ampersand >>>> which is not part of a valid escape sequence? >>>> >>>>> Again, I’m not sure if this is a valuable measure to concern ourselves >>>> with. >>>> >>>> I don't think it provides anything useful. >>>> >>>>>> >>>>>> The most one can do is to determine if a string has not been escaped. >>>>>> That would be the case where a string has one or more unescaped >>>>>> characters in it. >>>>>> For example "This & that" has obviously not been escaped. >>>>>> >>>>>> However if a string has no un-escaped characters it it, that does not >>>>>> necessarily mean that it has already been escaped. >>>>>> For example: "This & that". >>>>>> This might have been escaped - or it might not. >>>>> >>>>> Ah, I was using the definition of “having been escaped” to be that the >>>> string contains escape sequences. >>>>> >>>>>> For example it could be the answer to: "How does one code 'This & >>>>>> that' in HTML?” >>>>>> >>>>>> The application has to keep track of the escape-state of the string. >>>>> >>>>> Definitely agreed with your definition of “having been escaped." >>>>> >>>>>> >>>>>>> Cheers, >>>>>>> -Rob >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> On Sat, Feb 18, 2017 at 12:04 PM, Rob Tompkins <chtom...@gmail.com> >>>> wrote: >>>>>>>>> >>>>>>>>>> In preparation for the 1.0 release, I think we should address Sebb's >>>>>>>>>> concern in TEXT-40 about the attempt to create "idempotent" string >>>> escape >>>>>>>>>> methods. By idempotent I mean someMethod("some string") = >>>>>>>>>> someMethod(someMethod(someMethod(...someMethod("some string")))), a >>>>>>>>>> single application of a method is equal to any number of the >>>> applications >>>>>>>>>> of the method on the same input. >>>>>>>>>> >>>>>>>>>> Below I lay out a mechanism by which it is possible to write such >>>> methods, >>>>>>>>>> but I don’t know the value in writing such methods. I'm merely >>>> expressing >>>>>>>>>> that idempotency is a possibility. >>>>>>>>>> >>>>>>>>>> For string "un-escaping", I believe that we can write a method that, >>>>>>>>>> indeed, is idempotent by simply running the un-escape method the >>>> finite >>>>>>>>>> number of un-escapings to get to the point at which the string >>>> remains >>>>>>>>>> unchanged between applications of the un-escaping method. (I >>>> believe that I >>>>>>>>>> can write a proof that all un-escape methods have such a point, if >>>> that is >>>>>>>>>> needed for the sake of discussion). >>>>>>>>>> >>>>>>>>>> If indeed we can create an idempotent un-escape method, then we can >>>> simply >>>>>>>>>> take that method run it, and then run the escaping method one time. >>>> If we >>>>>>>>>> always completely unescape and then escape once then we do have an >>>>>>>>>> idempotent method. >>>>>>>>>> >>>>>>>>>> Such a method might not be all that valuable to the user though. >>>>>>>>>> Furthermore, this just explains one way to create such an idempotent >>>>>>>>>> method. Whether or not more or more valuable methods exists, would >>>> take >>>>>>>>>> some more though. >>>>>>>>>> >>>>>>>>>> Anyone have any thoughts? My feeling is that it might be more >>>> effort than >>>>>>>>>> it's worth to ensure that any string is only "singly encoded.” >>>> Further, we >>>>>>>>>> probably should give a look at the “escape_once” methods in >>>>>>>>>> StringEsapeUtils. >>>>>>>>>> >>>>>>>>>> Cheers >>>>>>>>>> -Rob >>>>>>>>>> ------------------------------------------------------------ >>>> --------- >>>>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org <mailto: >>>> dev-unsubscr...@commons.apache.org> >>>>>>>> For additional commands, e-mail: dev-h...@commons.apache.org <mailto: >>>> dev-h...@commons.apache.org> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >>>> For additional commands, e-mail: dev-h...@commons.apache.org >>>> >>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> <mailto:dev-unsubscr...@commons.apache.org> >> For additional commands, e-mail: dev-h...@commons.apache.org >> <mailto:dev-h...@commons.apache.org> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org