jenkins-bot has submitted this change and it was merged.

Change subject: Don't allow embedded newlines in magic links, but do allow 
 
......................................................................


Don't allow embedded newlines in magic links, but do allow  

This continues the work started in T67278 to make magic link parsing
more consistent with wiki text parsing in general, and closes two
long-standing bugs.

Bug: T30950
Bug: T31025
Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
---
M RELEASE-NOTES-1.25
M includes/parser/Parser.php
M tests/parser/parserTests.txt
3 files changed, 61 insertions(+), 7 deletions(-)

Approvals:
  Tim Starling: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/RELEASE-NOTES-1.25 b/RELEASE-NOTES-1.25
index 1956eb6..9183e44 100644
--- a/RELEASE-NOTES-1.25
+++ b/RELEASE-NOTES-1.25
@@ -279,6 +279,8 @@
    However, this difference is unlikely to arise in practice.
 * (T67278) RFC, PMID, and ISBN "magic links" must be surrounded by non-word
   characters on both sides.
+* (T30950, T31025) RFC, PMID, and ISBN "magic links" can no longer contain
+  newlines; but they can contain   and other non-newline whitespace.
 
 == Compatibility ==
 
diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index ecb14ed..e3a4ea5 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -90,6 +90,9 @@
        const EXT_IMAGE_REGEX = 
'/^(http:\/\/|https:\/\/)([^][<>"\\x00-\\x20\\x7F\p{Zs}]+)
                
\\/([A-Za-z0-9_.,~%\\-+&;#*?!=()@\\x80-\\xFF]+)\\.((?i)gif|png|jpg|jpeg)$/Sxu';
 
+       # Regular expression for a non-newline space
+       const SPACE_NOT_NL = '(?:\t|&nbsp;|&\#0*160;|&\#[Xx]0*[Aa]0;|\p{Zs})';
+
        # State constants for the definition list colon extraction
        const COLON_STATE_TEXT = 0;
        const COLON_STATE_TAG = 1;
@@ -1389,18 +1392,22 @@
                wfProfileIn( __METHOD__ );
                $prots = wfUrlProtocolsWithoutProtRel();
                $urlChar = self::EXT_LINK_URL_CLASS;
+               $space = self::SPACE_NOT_NL; #  non-newline space
+               $spdash = "(?:-|$space)"; # a dash or a non-newline space
+               $spaces = "$space++"; # possessive match of 1 or more spaces
                $text = preg_replace_callback(
                        '!(?:                           # Start cases
                                (<a[ \t\r\n>].*?</a>) |     # m[1]: Skip link 
text
                                (<.*?>) |                   # m[2]: Skip stuff 
inside HTML elements' . "
-                               (\b(?i:$prots)$urlChar+) |  # m[3]: Free 
external links" . '
-                               \b(?:RFC|PMID)\s+([0-9]+)\b |# m[4]: RFC or 
PMID, capture number
-                               \bISBN\s+(                  # m[5]: ISBN, 
capture number
-                                       (?: 97[89] [\ \-]? )?   # optional 
13-digit ISBN prefix
-                                       (?: [0-9]  [\ \-]? ){9} # 9 digits with 
opt. delimiters
+                               (\b(?i:$prots)$urlChar+) |  # m[3]: Free 
external links
+                               \b(?:RFC|PMID) $spaces      # m[4]: RFC or 
PMID, capture number
+                                       ([0-9]+)\b |
+                               \bISBN $spaces (            # m[5]: ISBN, 
capture number
+                                       (?: 97[89] $spdash? )?   # optional 
13-digit ISBN prefix
+                                       (?: [0-9]  $spdash? ){9} # 9 digits 
with opt. delimiters
                                        [0-9Xx]                 # check digit
-                                       )\b
-                       )!xu', array( &$this, 'magicLinkCallback' ), $text );
+                               )\b
+                       )!xu", array( &$this, 'magicLinkCallback' ), $text );
                wfProfileOut( __METHOD__ );
                return $text;
        }
@@ -1441,6 +1448,8 @@
                } elseif ( isset( $m[5] ) && $m[5] !== '' ) {
                        # ISBN
                        $isbn = $m[5];
+                       $space = self::SPACE_NOT_NL; #  non-newline space
+                       $isbn = preg_replace( "/$space/", ' ', $isbn );
                        $num = strtr( $isbn, array(
                                '-' => '',
                                ' ' => '',
diff --git a/tests/parser/parserTests.txt b/tests/parser/parserTests.txt
index f7dc0a9..cf9d829 100644
--- a/tests/parser/parserTests.txt
+++ b/tests/parser/parserTests.txt
@@ -8935,6 +8935,19 @@
 !! end
 
 !! test
+Magic links: RFC (w/ non-newline whitespace, bug 28950/29025)
+!! wikitext
+RFC &nbsp;&#160;&#0160;&#xA0;&#Xa0; 822
+RFC
+822
+!! html
+<p><a class="external mw-magiclink-rfc" rel="nofollow" 
href="//tools.ietf.org/html/rfc822">RFC 822</a>
+RFC
+822
+</p>
+!! end
+
+!! test
 Magic links: ISBN (bug 1937)
 !! wikitext
 ISBN 0-306-40615-2
@@ -8953,6 +8966,23 @@
 !! end
 
 !! test
+Magic links: ISBN (w/ non-newline whitespace, bug 28950/29025)
+!! wikitext
+ISBN &nbsp;&#160;&#0160;&#xA0;&#Xa0; 978&nbsp;0&#160;316&#0160;09811&#xA0;3
+ISBN
+9780316098113
+ISBN 978
+0316098113
+!! html
+<p><a href="/wiki/Special:BookSources/9780316098113" class="internal 
mw-magiclink-isbn">ISBN 978 0 316 09811 3</a>
+ISBN
+9780316098113
+ISBN 978
+0316098113
+</p>
+!! end
+
+!! test
 Magic links: PMID incorrectly converts space to underscore
 !! wikitext
 PMID 1234
@@ -8970,6 +9000,19 @@
 </p>
 !! end
 
+!! test
+Magic links: PMID (w/ non-newline whitespace, bug 28950/29025)
+!! wikitext
+PMID &nbsp;&#160;&#0160;&#xA0;&#Xa0; 1234
+PMID
+1234
+!! html
+<p><a class="external mw-magiclink-pmid" rel="nofollow" 
href="//www.ncbi.nlm.nih.gov/pubmed/1234?dopt=Abstract">PMID 1234</a>
+PMID
+1234
+</p>
+!! end
+
 ###
 ### Templates
 ####

-- 
To view, visit https://gerrit.wikimedia.org/r/133651
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I71f8b337543163569c64bbfdec154eb9b69d7264
Gerrit-PatchSet: 7
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Cscott <[email protected]>
Gerrit-Reviewer: Bartosz DziewoƄski <[email protected]>
Gerrit-Reviewer: Cscott <[email protected]>
Gerrit-Reviewer: Daniel Friesen <[email protected]>
Gerrit-Reviewer: Subramanya Sastry <[email protected]>
Gerrit-Reviewer: Tim Starling <[email protected]>
Gerrit-Reviewer: Umherirrender <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to