jenkins-bot has submitted this change and it was merged.

Change subject: Move location of tokenizing tags in attributes
......................................................................


Move location of tokenizing tags in attributes

 * First off, the tag rule in its current location wasn't even being hit.
   It was added at a time when attribute_preprocessor_text_line was used
   to tokenize attribute names, and correctly should be found in a name
   position. See I1864ec9e5586ecf465d3ae0689f638e284f1dd32.

 * That begs two questions:

   ** Why did we not see breakage when we denormalized the
      generic_attribute_name rule? Well, we made it more permissive, and
      now tags break up into multiple (mostly empty) attributes, instead
      of one chunk.

   ** Is that better or what's the right thing to do? Or, in other words,
      why not just move this to generic_attribute_name? The problem in
      context. In the generic case, you want the opening xmlish tag to
      break on the first closing brace >. (ie. don't tokenize a tag in
      attribute positionn for xmlish tags) See the test called, "Handle
      broken pre-like tags". However, in the php parser, tags found in
      attribute position of the wikitext table syntax are just dropped.
      So, the original intention was to only do this for table attribute
      names. Here we break off a new rule for that.

 * You can see the difference in the results table of,
   enwiki/2002_Australian_Formula_3_Championship?oldid=533855114

Change-Id: I941c595ef150666ea4535289d16e86c425c24389
---
M lib/pegTokenizer.pegjs.txt
1 file changed, 26 insertions(+), 16 deletions(-)

Approvals:
  Subramanya Sastry: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/lib/pegTokenizer.pegjs.txt b/lib/pegTokenizer.pegjs.txt
index 84bb402..1daf7e1 100644
--- a/lib/pegTokenizer.pegjs.txt
+++ b/lib/pegTokenizer.pegjs.txt
@@ -1250,7 +1250,7 @@
 table_attribute
   = s:optionalSpaceToken
     namePos0:("" { return endOffset(); })
-    name:generic_attribute_name
+    name:table_attribute_name
     namePos:("" { return endOffset(); })
     valueData:(optionalSpaceToken
         v:table_attribute_value { return v; })?
@@ -1290,6 +1290,25 @@
           // /=>"' is the html5 attribute name set we do not want.
           // \[ is to avoid eating links. (see: BUG 553: link with two 
variables in a piped link)
           t:( directive / !( space_or_newline / [\[/=>"'] ) c:. { return c; }
+        ) { return t; }
+      )+ {
+    return tu.flattenString(r);
+  }
+
+// Same as generic_attribute_name, except for accepting tags found here.
+// That doesn't make sense (ie. match php) in the generic case.
+table_attribute_name
+  = r:( $[^ \t\0\n\r/=>"'!<&\[\]|{}\-]+
+        / ! inline_breaks
+          ! '/>'
+          // /=>"' is the html5 attribute name set we do not want.
+          // \[ is to avoid eating links. (see: BUG 553: link with two 
variables in a piped link)
+          t:( directive
+              // Accept insane tags-inside-attributes as attribute names.
+              // The sanitizer with strip and shadow them for roundtripping.
+              // Example: <hiddentext>generated with.. </hiddentext>
+              / &generic_tag nb:nested_block_line { return nb; }
+              / !( space_or_newline / [\[/=>"'] ) c:. { return c; }
         ) { return t; }
       )+ {
     return tu.flattenString(r);
@@ -2061,21 +2080,12 @@
 // Variants with the entire attribute on a single line
 attribute_preprocessor_text_line
   = r:( $[^=<>\n\r&'"\t \[\]|{}/!\-]+
-        /  !inline_breaks
-            ! '/>'
-            t:(
-                directive
-              // Eat insane tags-inside-attributes. Example:
-              // <hiddentext>generated with.. </hiddentext>
-              / &generic_tag nb:nested_block_line { return nb; }
-              / !(space_or_newline / [\[=>]) c:. {
-                    return c;
-                }
-            ) { return t; }
-      )+
-  {
-      return tu.flattenString(r);
-  }
+        / !inline_breaks
+          !'/>'
+          t:( directive
+              / !(space_or_newline / [\[=>]) c:. { return c; }
+          ) { return t; }
+    )+ { return tu.flattenString(r); }
 
 attribute_preprocessor_text_single_line
   = r:( $[^{}&'<\n\-]+

-- 
To view, visit https://gerrit.wikimedia.org/r/229941
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I941c595ef150666ea4535289d16e86c425c24389
Gerrit-PatchSet: 7
Gerrit-Project: mediawiki/services/parsoid
Gerrit-Branch: master
Gerrit-Owner: Arlolra <[email protected]>
Gerrit-Reviewer: Arlolra <[email protected]>
Gerrit-Reviewer: Cscott <[email protected]>
Gerrit-Reviewer: Subramanya Sastry <[email protected]>
Gerrit-Reviewer: Tim Starling <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to