Test errors in parsing named entities

Matt Wed, 15 Sep 2010 01:56:36 -0700

The following entity tests all look like they're incorrect:

Name: Entity in attribute without semicolon ending in x
Input: <h a='&notx'>


Name: Entity in attribute without semicolon ending in 1
Input: <h a='&not1'>

Name: Entity in attribute without semicolon ending in i
Input: <h a='&noti'>

Name: Undefined named entity in attribute value ending in semicolon
and whose name starts with a known entity name.
Input: <h a='&noti;'>

In each case, the test expects a single error (presumably for the lack
of a trailing semi-colon after the last matched character which is
't'), but the wording of the spec does NOT make these cases an error:

Here's the relevant text from the spec:

"If the character reference is being consumed as part of an attribute,
and the last character matched is not a U+003B SEMICOLON character
(;), and the next character is either a U+003D EQUALS SIGN character
(=) or in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U
+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z, or U
+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for
historical reasons, all the characters that were matched after the U
+0026 AMPERSAND character (&) must be unconsumed, and nothing is
returned."

1. Character is being consumed as part of an attribute
2. The last character matched ('t') is not a semi-colon
3. In all three cases, the next character is an alphanumeric
character, which the spec says to ignore without emitting an error.

The other set of tests that appear incorrect deal with the opposite
case in which no named entity was matched.  These tests are the ""Bad
named entity: XXXXX without a semi-colon" tests along with a few other
random ones:

Name: Entity name followed by the equals sign in an attribute value.
Input: <h a='&lang='>

Name: Entity without a name
Input: &;

Name: Non-allowed ' after ampersand in attribute value
Input: <z z=\"&'\">

Name: Non-allowed \" after ampersand in attribute value
Input: <z z='&\"'>

Name: Non-ASCII character reference name
Input: &\u00AC;

Name: Partial entity match at end of file
Input: I'm &no

Name: Text after bogus character reference
Input: <z z='&xlink_xmlns;'>bar<z>

Name: Unfinished entity
Input: &f

Here's the relevant text from the spec:

"If no match can be made, then no characters are consumed, and nothing
is returned. In this case, if the characters after the U+0026
AMPERSAND character (&) consist of a sequence of one or more
characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE
(9), U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z, and U
+0041 LATIN CAPITAL LETTER A to U+005A LATIN CAPITAL LETTER Z,
followed by a U+003B SEMICOLON character (;), then this is a parse
error."

These tests expect an error, but no error should be emitted since for
an unmatched named entity to be an error, the ampersand needs to be
followed by at least 1 alphanumeric character, immediately followed by
a semicolon, which is not the case in these tests.

-- 
You received this message because you are subscribed to the Google Groups 
"html5lib-discuss" group.
To post to this group, send an email to html5lib-disc...@googlegroups.com.
To unsubscribe from this group, send email to 
html5lib-discuss+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/html5lib-discuss?hl=en-GB.

Test errors in parsing named entities

Reply via email to