jenkins-bot has submitted this change and it was merged.

Change subject: Don't break autolinks by stripping the final semicolon from an 
entity.
......................................................................


Don't break autolinks by stripping the final semicolon from an entity.

Autolinking free external links is clever about making sure that trailing
punctuation isn't included in the link.  But if an HTML entity happens to
terminate the URL, the semicolon from the entity is stripped from the url,
breaking it.

Fix this corner case.  This also unifies autolink parsing with Parsoid.

See: I5ae8435322c78dd1df170d7a3543fff3642759b1
Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
---
M includes/parser/Parser.php
M tests/parser/parserTests.txt
2 files changed, 28 insertions(+), 1 deletion(-)

Approvals:
  Tim Starling: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index a9daa22..ecb14ed 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -1484,7 +1484,20 @@
                        $sep .= ')';
                }
 
-               $numSepChars = strspn( strrev( $url ), $sep );
+               $urlRev = strrev( $url );
+               $numSepChars = strspn( $urlRev, $sep );
+               # Don't break a trailing HTML entity by moving the ; into $trail
+               # This is in hot code, so use substr_compare to avoid having to
+               # create a new string object for the comparison
+               if ( $numSepChars && substr_compare( $url, ";", -$numSepChars, 
1 ) === 0) {
+                       # more optimization: instead of running preg_match with 
a $
+                       # anchor, which can be slow, do the match on the 
reversed
+                       # string starting at the desired offset.
+                       # un-reversed regexp is: /&([a-z]+|#x[\da-f]+|#\d+)$/i
+                       if ( preg_match( '/\G([a-z]+|[\da-f]+x#|\d+#)&/i', 
$urlRev, $m2, 0, $numSepChars ) ) {
+                               $numSepChars--;
+                       }
+               }
                if ( $numSepChars ) {
                        $trail = substr( $url, -$numSepChars ) . $trail;
                        $url = substr( $url, 0, -$numSepChars );
diff --git a/tests/parser/parserTests.txt b/tests/parser/parserTests.txt
index c7fc380..63f6a75 100644
--- a/tests/parser/parserTests.txt
+++ b/tests/parser/parserTests.txt
@@ -4171,6 +4171,13 @@
 http://example.com?
 http://example.com)
 http://example.com/url_with_(brackets)
+(http://example.com/url_without_brackets)
+http://example.com/url_with_entity 
+http://example.com/url_with_entity 
+http://example.com/url_with_entity 
+http://example.com/url_with_entity<
+http://example.com/url_with_entity<
+http://example.com/url_with_entity<
 !! html
 <p><a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>,
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>;
@@ -4181,6 +4188,13 @@
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>?
 <a rel="nofollow" class="external free" 
href="http://example.com";>http://example.com</a>)
 <a rel="nofollow" class="external free" 
href="http://example.com/url_with_(brackets)">http://example.com/url_with_(brackets)</a>
+(<a rel="nofollow" class="external free" 
href="http://example.com/url_without_brackets";>http://example.com/url_without_brackets</a>)
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity ";>http://example.com/url_with_entity 
</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity";>http://example.com/url_with_entity</a>&lt;
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity%3C";>http://example.com/url_with_entity%3C</a>
+<a rel="nofollow" class="external free" 
href="http://example.com/url_with_entity%3C";>http://example.com/url_with_entity%3C</a>
 </p>
 !! end
 

-- 
To view, visit https://gerrit.wikimedia.org/r/179185
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I5482782c25e12283030b0fd2150ac55092f7979b
Gerrit-PatchSet: 3
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Cscott <[email protected]>
Gerrit-Reviewer: Bartosz Dziewoński <[email protected]>
Gerrit-Reviewer: CSteipp <[email protected]>
Gerrit-Reviewer: Cscott <[email protected]>
Gerrit-Reviewer: Jackmcbarn <[email protected]>
Gerrit-Reviewer: Subramanya Sastry <[email protected]>
Gerrit-Reviewer: Tim Starling <[email protected]>
Gerrit-Reviewer: Umherirrender <[email protected]>
Gerrit-Reviewer: jenkins-bot <>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to