Hi Cindy - the purpose of the regex is actually to canonicalise HTML tag names rather than user text, so
your test should include editing a field which contains an initial value which is some markup. THe issue is
not so much that of "loss" as of false detection of change - so the acid test is to ensure that a field with
an ORIGINAL rich text value can be edited, and immediately saved without triggering change detection. The
tests you have performed are useful too - but those can actually be done in automated tests rather than
needing to be done manually.
Cheers,
Antranig
On 02/06/2011 13:02, Li, Cindy wrote:
Hi Antranig,
Following up yesterday's dev meeting, I'm helping with testing the regex that
was modified in this jira.
What I've done:
1. I came up with a string that contains all the keyboard characters and letters in
lower& upper case
2. input the string into inline edit fields of these demos:
src/webapp/demos/inlineEdit/simple/html/inlineEdit.html
src/webapp/demos/inlineEdit/rich/html/inlineEdit.html
3. ensure none of the characters in the string was lost during edit, save,
undo, redo.
So far so good.
Do you think this test is decent enough? Anything else I should test? Thanks.
Cindy
________________________________________
From: Antranig Basman [[email protected]]
Sent: 31 May 2011 21:02
To: Fluid Work; Li, Cindy
Subject: FLUID-4132
Hi there - Justin has asked me to review the pull branch at
https://github.com/fluid-project/infusion/pull/22 - in particular the regexp
which on line 67 of
InlineEditIntegrations.js has been changed from /\<(\S+)[^\>\s]*\>/g to
/\<([a-z0-9A-Z\/]+)\>/g to head off
a JSLint error.
From what I can see, the error JSLint is reporting is
Problem at line 67 character 38: Insecure '^'.
This seems like a pretty subtle issue... hunting around, I find this fairly
clear explanation at
stackOverflow of what Crockford's intent might have been when issuing this
error.
http://stackoverflow.com/questions/4109214/jslint-insecure-in-regular-expression
I can think of a few ways of approaching this issue...
Firstly, I think an important angle is that this regular expression is being
used for canonicalization,
rather than for validation. The purpose of the "normalizeHTML" function is to
fold different expressions of
essentially the same markup into a canonical string form, so that it is
possible to tell whether, by
operating the rich text control, the user really made a substantial change to
the markup or not, by simply
comparing the mapped string for equality. Or more accurately, the user
experience we want is NOT for the
user to open the rich text control, issue a "save" without making any changes
whatsoever, but for the
control to have considered there has been a change.
So - it is not necessary for the canonicalizer to be particularly "aggressive",
or accurate - but we need to
make sure that there are no false positives of the kind just described.
Carrying on, the particular regexp that was put there was not terribly
"well-attested" - this falls into
that grey area of functionality that is very hard to address with unit tests
since it involves actually
taking the particular rich text control in question through its lifecycle and
seeing what happens. That
said, it is not certainly impossible to construct a test for this function as
it is for focus/blur issues,
just probably rather difficult. The "acid test" for this regexp consists of taking our
"demo" page for the
rich text control through the cycle of triggering edit, saving immediately, and seeing if
the "undo" widget
appears or not.
Some further things we could say about the regexp - if its domain were intended
to be all of XML, it would
be no good - http://www.w3.org/TR/2004/REC-xml11-20040204/ - there are numerous
characters in "NameChar" and
"NameStartChar" that aren't covered. However, given it is only intended to
operate on the standard dialect
of HTML tags, it is probably good enough -
http://www.w3.org/TR/html401/sgml/dtd.html .
Another thing to say about the regexp.... it looks like it is trying to "do
that thing which should never be
done", which is, "to parse HTML using a regexp" -
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
All we can say here is to fall back on the role of the regexp precisely - just
as it is not intended for
"validation", it is also not intended for "parsing" - all we need is "good
enough canonicalisation for this
purpose" which I think that Cindy's regexp achieves - so long as it passes the
manual test. This concept of
"good enough" does depend on the exact rich text controls that we are
supporting - the task may have got
easier since we ditched FCKEditor.
The final thing we could say about the regexp - this dates from an era where
our community standards for
these kinds of things were lower than they are now, otherwise the author (in
this instance, me) would have
put a comment linking to a JIRA explaining at least some of these issues which
are listed above :)
=========
The bottom line - since the regexp is used for canonicalisation rather than
validation, I don't think
Crockford's warning needs to be taken seriously - however, we may as well get
rid of it by rewriting the
expression if we can, as long as we can verify that the expression is still
good for its original purpose by
running manual tests on our integrations demo for the rich text widgets we are
intending to support.
_______________________________________________________
fluid-work mailing list - [email protected]
To unsubscribe, change settings or access archives,
see http://fluidproject.org/mailman/listinfo/fluid-work