Cscott has uploaded a new change for review.
https://gerrit.wikimedia.org/r/77907
Change subject: Non-word characters don't terminate tag names.
......................................................................
Non-word characters don't terminate tag names.
The PHP sanitizer was including only \w+ in tag names. This meant that
<b.foo> and <bä> were converted to <b> tags (bug 17663); <s.foo> and
<s-id> were treated as <s> tags (bug 40670), and <sub-ID#1> was treated
as a <sub> tag (bug 52022). Fix the sanitizer.
Bug: 17663
Bug: 40670
Bug: 52022
Change-Id: Iceec404f46703065bf080dd2cbfed1f88c204fa5
---
M includes/Sanitizer.php
M includes/parser/Parser.php
M tests/parser/parserTests.txt
3 files changed, 41 insertions(+), 6 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core
refs/changes/07/77907/1
diff --git a/includes/Sanitizer.php b/includes/Sanitizer.php
index f3a5281..1432a8b 100644
--- a/includes/Sanitizer.php
+++ b/includes/Sanitizer.php
@@ -448,7 +448,7 @@
# $params: String between element name and >
# $brace: Ending '>' or '/>'
# $rest: Everything until the next element of
$bits
- if ( preg_match(
'!^(/?)(\\w+)([^>]*?)(/{0,1}>)([^<]*)$!', $x, $regs ) ) {
+ if ( preg_match(
'!^(/?)([^\\s/>]+)([^>]*?)(/{0,1}>)([^<]*)$!', $x, $regs ) ) {
list( /* $qbar */, $slash, $t, $params,
$brace, $rest ) = $regs;
} else {
$slash = $t = $params = $brace = $rest
= null;
diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index 813aaca..ac0bae5 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -1543,7 +1543,7 @@
* Replace external links (REL)
*
* Note: this is all very hackish and the order of execution matters a
lot.
- * Make sure to run maintenance/parserTests.php if you change this code.
+ * Make sure to run tests/parserTests.php if you change this code.
*
* @private
*
diff --git a/tests/parser/parserTests.txt b/tests/parser/parserTests.txt
index f4a85bc..7347605 100644
--- a/tests/parser/parserTests.txt
+++ b/tests/parser/parserTests.txt
@@ -874,6 +874,43 @@
</p>
!! end
+# <strike> is HTML4, <s> is HTML4/5.
+!! test
+<s> or <strike> for strikethrough (bug 40670)
+!! input
+<strike>strike</strike>
+
+<s>s</s>
+!! result
+<p><strike>strike</strike>
+</p><p><s>s</s>
+</p>
+!! end
+
+!! test
+Non-word characters don't terminate tag names (bug 17663, 40670, 52022)
+!! input
+<b→> doesn't work! </b>
+
+<bä> doesn't work! </b>
+
+<boo> works fine </b>
+
+<s.foo>foo</s>
+
+<s.foo>s.foo</s.foo>
+
+<sub-ID#1>
+!! result
+<p><b→> doesn't work! </b>
+</p><p><bä> doesn't work! </b>
+</p><p><boo> works fine </b>
+</p><p><s.foo>foo</s>
+</p><p><s.foo>s.foo</s.foo>
+</p><p><sub-ID#1>
+</p>
+!! end
+
###
### Special characters
###
@@ -16129,12 +16166,10 @@
!! end
-# This fails in the PHP parser (see bug 40670,
-# https://bugzilla.wikimedia.org/show_bug.cgi?id=40670), so disabled for it.
+# This was a bug in the PHP parser (see bug 40670,
+# https://bugzilla.wikimedia.org/show_bug.cgi?id=40670)
!! test
Tag names followed by punctuation should not be recognized as tags
-!! options
-parsoid
!! input
<s.ome> text
!! result
--
To view, visit https://gerrit.wikimedia.org/r/77907
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iceec404f46703065bf080dd2cbfed1f88c204fa5
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Cscott <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits