jenkins-bot has submitted this change and it was merged.
Change subject: Move location of tokenizing tags in attributes
......................................................................
Move location of tokenizing tags in attributes
* First off, the tag rule in its current location wasn't even being hit.
It was added at a time when attribute_preprocessor_text_line was used
to tokenize attribute names, and correctly should be found in a name
position. See I1864ec9e5586ecf465d3ae0689f638e284f1dd32.
* That begs two questions:
** Why did we not see breakage when we denormalized the
generic_attribute_name rule? Well, we made it more permissive, and
now tags break up into multiple (mostly empty) attributes, instead
of one chunk.
** Is that better or what's the right thing to do? Or, in other words,
why not just move this to generic_attribute_name? The problem in
context. In the generic case, you want the opening xmlish tag to
break on the first closing brace >. (ie. don't tokenize a tag in
attribute positionn for xmlish tags) See the test called, "Handle
broken pre-like tags". However, in the php parser, tags found in
attribute position of the wikitext table syntax are just dropped.
So, the original intention was to only do this for table attribute
names. Here we break off a new rule for that.
* You can see the difference in the results table of,
enwiki/2002_Australian_Formula_3_Championship?oldid=533855114
Change-Id: I941c595ef150666ea4535289d16e86c425c24389
---
M lib/pegTokenizer.pegjs.txt
1 file changed, 26 insertions(+), 16 deletions(-)
Approvals:
Subramanya Sastry: Looks good to me, approved
jenkins-bot: Verified
diff --git a/lib/pegTokenizer.pegjs.txt b/lib/pegTokenizer.pegjs.txt
index 84bb402..1daf7e1 100644
--- a/lib/pegTokenizer.pegjs.txt
+++ b/lib/pegTokenizer.pegjs.txt
@@ -1250,7 +1250,7 @@
table_attribute
= s:optionalSpaceToken
namePos0:("" { return endOffset(); })
- name:generic_attribute_name
+ name:table_attribute_name
namePos:("" { return endOffset(); })
valueData:(optionalSpaceToken
v:table_attribute_value { return v; })?
@@ -1290,6 +1290,25 @@
// /=>"' is the html5 attribute name set we do not want.
// \[ is to avoid eating links. (see: BUG 553: link with two
variables in a piped link)
t:( directive / !( space_or_newline / [\[/=>"'] ) c:. { return c; }
+ ) { return t; }
+ )+ {
+ return tu.flattenString(r);
+ }
+
+// Same as generic_attribute_name, except for accepting tags found here.
+// That doesn't make sense (ie. match php) in the generic case.
+table_attribute_name
+ = r:( $[^ \t\0\n\r/=>"'!<&\[\]|{}\-]+
+ / ! inline_breaks
+ ! '/>'
+ // /=>"' is the html5 attribute name set we do not want.
+ // \[ is to avoid eating links. (see: BUG 553: link with two
variables in a piped link)
+ t:( directive
+ // Accept insane tags-inside-attributes as attribute names.
+ // The sanitizer with strip and shadow them for roundtripping.
+ // Example: <hiddentext>generated with.. </hiddentext>
+ / &generic_tag nb:nested_block_line { return nb; }
+ / !( space_or_newline / [\[/=>"'] ) c:. { return c; }
) { return t; }
)+ {
return tu.flattenString(r);
@@ -2061,21 +2080,12 @@
// Variants with the entire attribute on a single line
attribute_preprocessor_text_line
= r:( $[^=<>\n\r&'"\t \[\]|{}/!\-]+
- / !inline_breaks
- ! '/>'
- t:(
- directive
- // Eat insane tags-inside-attributes. Example:
- // <hiddentext>generated with.. </hiddentext>
- / &generic_tag nb:nested_block_line { return nb; }
- / !(space_or_newline / [\[=>]) c:. {
- return c;
- }
- ) { return t; }
- )+
- {
- return tu.flattenString(r);
- }
+ / !inline_breaks
+ !'/>'
+ t:( directive
+ / !(space_or_newline / [\[=>]) c:. { return c; }
+ ) { return t; }
+ )+ { return tu.flattenString(r); }
attribute_preprocessor_text_single_line
= r:( $[^{}&'<\n\-]+
--
To view, visit https://gerrit.wikimedia.org/r/229941
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I941c595ef150666ea4535289d16e86c425c24389
Gerrit-PatchSet: 7
Gerrit-Project: mediawiki/services/parsoid
Gerrit-Branch: master
Gerrit-Owner: Arlolra <[email protected]>
Gerrit-Reviewer: Arlolra <[email protected]>
Gerrit-Reviewer: Cscott <[email protected]>
Gerrit-Reviewer: Subramanya Sastry <[email protected]>
Gerrit-Reviewer: Tim Starling <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits