Hi Antranig, Following up yesterday's dev meeting, I'm helping with testing the regex that was modified in this jira.
What I've done: 1. I came up with a string that contains all the keyboard characters and letters in lower & upper case 2. input the string into inline edit fields of these demos: src/webapp/demos/inlineEdit/simple/html/inlineEdit.html src/webapp/demos/inlineEdit/rich/html/inlineEdit.html 3. ensure none of the characters in the string was lost during edit, save, undo, redo. So far so good. Do you think this test is decent enough? Anything else I should test? Thanks. Cindy ________________________________________ From: Antranig Basman [[email protected]] Sent: 31 May 2011 21:02 To: Fluid Work; Li, Cindy Subject: FLUID-4132 Hi there - Justin has asked me to review the pull branch at https://github.com/fluid-project/infusion/pull/22 - in particular the regexp which on line 67 of InlineEditIntegrations.js has been changed from /\<(\S+)[^\>\s]*\>/g to /\<([a-z0-9A-Z\/]+)\>/g to head off a JSLint error. From what I can see, the error JSLint is reporting is Problem at line 67 character 38: Insecure '^'. This seems like a pretty subtle issue... hunting around, I find this fairly clear explanation at stackOverflow of what Crockford's intent might have been when issuing this error. http://stackoverflow.com/questions/4109214/jslint-insecure-in-regular-expression I can think of a few ways of approaching this issue... Firstly, I think an important angle is that this regular expression is being used for canonicalization, rather than for validation. The purpose of the "normalizeHTML" function is to fold different expressions of essentially the same markup into a canonical string form, so that it is possible to tell whether, by operating the rich text control, the user really made a substantial change to the markup or not, by simply comparing the mapped string for equality. Or more accurately, the user experience we want is NOT for the user to open the rich text control, issue a "save" without making any changes whatsoever, but for the control to have considered there has been a change. So - it is not necessary for the canonicalizer to be particularly "aggressive", or accurate - but we need to make sure that there are no false positives of the kind just described. Carrying on, the particular regexp that was put there was not terribly "well-attested" - this falls into that grey area of functionality that is very hard to address with unit tests since it involves actually taking the particular rich text control in question through its lifecycle and seeing what happens. That said, it is not certainly impossible to construct a test for this function as it is for focus/blur issues, just probably rather difficult. The "acid test" for this regexp consists of taking our "demo" page for the rich text control through the cycle of triggering edit, saving immediately, and seeing if the "undo" widget appears or not. Some further things we could say about the regexp - if its domain were intended to be all of XML, it would be no good - http://www.w3.org/TR/2004/REC-xml11-20040204/ - there are numerous characters in "NameChar" and "NameStartChar" that aren't covered. However, given it is only intended to operate on the standard dialect of HTML tags, it is probably good enough - http://www.w3.org/TR/html401/sgml/dtd.html . Another thing to say about the regexp.... it looks like it is trying to "do that thing which should never be done", which is, "to parse HTML using a regexp" - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags All we can say here is to fall back on the role of the regexp precisely - just as it is not intended for "validation", it is also not intended for "parsing" - all we need is "good enough canonicalisation for this purpose" which I think that Cindy's regexp achieves - so long as it passes the manual test. This concept of "good enough" does depend on the exact rich text controls that we are supporting - the task may have got easier since we ditched FCKEditor. The final thing we could say about the regexp - this dates from an era where our community standards for these kinds of things were lower than they are now, otherwise the author (in this instance, me) would have put a comment linking to a JIRA explaining at least some of these issues which are listed above :) ========= The bottom line - since the regexp is used for canonicalisation rather than validation, I don't think Crockford's warning needs to be taken seriously - however, we may as well get rid of it by rewriting the expression if we can, as long as we can verify that the expression is still good for its original purpose by running manual tests on our integrations demo for the rich text widgets we are intending to support. _______________________________________________________ fluid-work mailing list - [email protected] To unsubscribe, change settings or access archives, see http://fluidproject.org/mailman/listinfo/fluid-work
