Hi Cindy - the purpose of the regex is actually to canonicalise HTML tag names rather than user text, so your test should include editing a field which contains an initial value which is some markup. THe issue is not so much that of "loss" as of false detection of change - so the acid test is to ensure that a field with an ORIGINAL rich text value can be edited, and immediately saved without triggering change detection. The tests you have performed are useful too - but those can actually be done in automated tests rather than needing to be done manually.

Cheers,
Antranig

On 02/06/2011 13:02, Li, Cindy wrote:
Hi Antranig,

Following up yesterday's dev meeting, I'm helping with testing the regex that 
was modified in this jira.

What I've done:

1. I came up with a string that contains all the keyboard characters and letters in 
lower&  upper case
2. input the string into inline edit fields of these demos:

src/webapp/demos/inlineEdit/simple/html/inlineEdit.html
src/webapp/demos/inlineEdit/rich/html/inlineEdit.html

3. ensure none of the characters in the string was lost during edit, save, 
undo, redo.

So far so good.

Do you think this test is decent enough? Anything else I should test? Thanks.

Cindy
________________________________________
From: Antranig Basman [[email protected]]
Sent: 31 May 2011 21:02
To: Fluid Work; Li, Cindy
Subject: FLUID-4132

Hi there - Justin has asked me to review the pull branch at
https://github.com/fluid-project/infusion/pull/22 - in particular the regexp 
which on line 67 of
InlineEditIntegrations.js has been changed from /\<(\S+)[^\>\s]*\>/g to 
/\<([a-z0-9A-Z\/]+)\>/g to head off
a JSLint error.

  From what I can see, the error JSLint is reporting is

Problem at line 67 character 38: Insecure '^'.

This seems like a pretty subtle issue... hunting around, I find this fairly 
clear explanation at
stackOverflow of what Crockford's intent might have been when issuing this 
error.

http://stackoverflow.com/questions/4109214/jslint-insecure-in-regular-expression

I can think of a few ways of approaching this issue...

Firstly, I think an important angle is that this regular expression is being 
used for canonicalization,
rather than for validation. The purpose of the "normalizeHTML" function is to 
fold different expressions of
essentially the same markup into a canonical string form, so that it is 
possible to tell whether, by
operating the rich text control, the user really made a substantial change to 
the markup or not, by simply
comparing the mapped string for equality. Or more accurately, the user 
experience we want is NOT for the
user to open the rich text control, issue a "save" without making any changes 
whatsoever, but for the
control to have considered there has been a change.

So - it is not necessary for the canonicalizer to be particularly "aggressive", 
or accurate - but we need to
make sure that there are no false positives of the kind just described.

Carrying on, the particular regexp that was put there was not terribly 
"well-attested" - this falls into
that grey area of functionality that is very hard to address with unit tests 
since it involves actually
taking the particular rich text control in question through its lifecycle and 
seeing what happens. That
said, it is not certainly impossible to construct a test for this function as 
it is for focus/blur issues,
just probably rather difficult. The "acid test" for this regexp consists of taking our 
"demo" page for the
rich text control through the cycle of triggering edit, saving immediately, and seeing if 
the "undo" widget
appears or not.

Some further things we could say about the regexp - if its domain were intended 
to be all of XML, it would
be no good - http://www.w3.org/TR/2004/REC-xml11-20040204/ - there are numerous 
characters in "NameChar" and
"NameStartChar" that aren't covered. However, given it is only intended to 
operate on the standard dialect
of HTML tags, it is probably good enough - 
http://www.w3.org/TR/html401/sgml/dtd.html .

Another thing to say about the regexp.... it looks like it is trying to "do 
that thing which should never be
done", which is, "to parse HTML using a regexp" -
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

All we can say here is to fall back on the role of the regexp precisely - just 
as it is not intended for
"validation", it is also not intended for "parsing" - all we need is "good 
enough canonicalisation for this
purpose" which I think that Cindy's regexp achieves - so long as it passes the 
manual test. This concept of
"good enough" does depend on the exact rich text controls that we are 
supporting - the task may have got
easier since we ditched FCKEditor.

The final thing we could say about the regexp - this dates from an era where 
our community standards for
these kinds of things were lower than they are now, otherwise the author (in 
this instance, me) would have
put a comment linking to a JIRA explaining at least some of these issues which 
are listed above :)

=========
The bottom line - since the regexp is used for canonicalisation rather than 
validation, I don't think
Crockford's warning needs to be taken seriously - however, we may as well get 
rid of it by rewriting the
expression if we can, as long as we can verify that the expression is still 
good for its original purpose by
running manual tests on our integrations demo for the rich text widgets we are 
intending to support.


_______________________________________________________
fluid-work mailing list - [email protected]
To unsubscribe, change settings or access archives,
see http://fluidproject.org/mailman/listinfo/fluid-work

Reply via email to