PleaseStand has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/293910

Change subject: IcuCollation: Remove null terminator from sort key if present
......................................................................

IcuCollation: Remove null terminator from sort key if present

In PHP 5.3.15 and 5.4.5, there was a change to Collation::getSortKey().
Now, its return value now does not end with a null byte. A corresponding
change to HHVM was not made until May 2016. Thus, sort keys depended not
just on ICU library version but also on PHP/HHVM version (or PECL intl
version), which is undesirable.

* In getSortKey() and getPrimarySortKey(), trim off any null terminator.
* In the return value of fetchFirstLetterData(), use trimmed keys rather
  than untrimmed keys. Because each trimmed key sorts immediately above
  the corresponding untrimmed key, this should be fine; getFirstLetter()
  should still work as expected.
* Bump FIRST_LETTER_VERSION accordingly. Though the sort key format may
  mean this is not strictly necessary, it is implementation-defined,
  aside from the restriction that only the terminator is a null byte.
* As sort keys will change on some Wikimedia sites, add a migration flag
  to allow scheduling the necessary database updates separately from
  the next MediaWiki update.

Bug: T137642
Change-Id: I241e15985d4f81e2a9c2420dc7301c16e7788512
---
M RELEASE-NOTES-1.27
M includes/DefaultSettings.php
M includes/collation/IcuCollation.php
3 files changed, 50 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/10/293910/1

diff --git a/RELEASE-NOTES-1.27 b/RELEASE-NOTES-1.27
index c13be51..c7b006c 100644
--- a/RELEASE-NOTES-1.27
+++ b/RELEASE-NOTES-1.27
@@ -141,6 +141,10 @@
    those via the web UI. Use UserLoggedIn if you need to do something on all
    logins.
 ** $wgRequirePasswordforEmailChange is removed.
+* (T137642) If $wgCategoryCollation is set to a value other than "uppercase"
+  or "identity", and your site ran a PHP version older than 5.3.15 or 5.4.5,
+  or any version of HHVM, you should run maintenance/updateCollation.php
+  (with the --force option) after upgrading.
 
 === New features in 1.27 ===
 * $wgDataCenterUpdateStickTTL was also added. This decides how long a user
diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php
index 2607797..16c804c 100644
--- a/includes/DefaultSettings.php
+++ b/includes/DefaultSettings.php
@@ -7328,6 +7328,27 @@
  */
 $wgCategoryCollation = 'uppercase';
 
+/**
+ * If the wiki uses a UCA collation, whether to append a null byte to each
+ * sort key.
+ *
+ * Because of a bug, some old versions of PHP's intl extension (including
+ * some versions of HHVM) return the sort key's null terminator as the
+ * last character of the string. Until 1.27, MediaWiki did not strip off
+ * that null byte, which made the sort key not merely dependent on ICU
+ * version, but also on PHP extension version. This has changed.
+ *
+ * This setting is a migration flag, intended for use on large wiki farms,
+ * to allow administrators to schedule the necessary database updates
+ * independently of each other and of MediaWiki updates. It will be
+ * removed in the next MediaWiki version.
+ *
+ * @deprecated since 1.27
+ * @since 1.27
+ * @see https://phabricator.wikimedia.org/T137642
+ */
+$wgAppendNullToUcaSortKeys = false;
+
 /** @} */ # End categories }
 
 /*************************************************************************//**
diff --git a/includes/collation/IcuCollation.php 
b/includes/collation/IcuCollation.php
index 27f917b..bfaf960 100644
--- a/includes/collation/IcuCollation.php
+++ b/includes/collation/IcuCollation.php
@@ -22,7 +22,7 @@
  * @since 1.16.3
  */
 class IcuCollation extends Collation {
-       const FIRST_LETTER_VERSION = 2;
+       const FIRST_LETTER_VERSION = 3;
 
        /** @var Collator */
        private $primaryCollator;
@@ -193,11 +193,27 @@
        }
 
        public function getSortKey( $string ) {
-               return $this->mainCollator->getSortKey( $string );
+               global $wgAppendNullToUcaSortKeys;
+
+               // Remove the null terminator byte if one is present
+               // https://github.com/facebook/hhvm/issues/7106
+               $sortKey = rtrim( $this->mainCollator->getSortKey( $string ), 
"\0" );
+               if ( $wgAppendNullToUcaSortKeys ) {
+                       $sortKey .= "\0";
+               }
+               return $sortKey;
        }
 
        public function getPrimarySortKey( $string ) {
-               return $this->primaryCollator->getSortKey( $string );
+               global $wgAppendNullToUcaSortKeys;
+
+               // Remove the null terminator byte if one is present
+               // https://github.com/facebook/hhvm/issues/7106
+               $sortKey = rtrim( $this->primaryCollator->getSortKey( $string 
), "\0" );
+               if ( $wgAppendNullToUcaSortKeys ) {
+                       $sortKey .= "\0";
+               }
+               return $sortKey;
        }
 
        public function getFirstLetter( $string ) {
@@ -289,6 +305,9 @@
                $letterMap = [];
                foreach ( $letters as $letter ) {
                        $key = $this->getPrimarySortKey( $letter );
+                       if ( $wgAppendNullToUcaSortKeys ) {
+                               $key = rtrim( $key, "\0" );
+                       }
                        if ( isset( $letterMap[$key] ) ) {
                                // Primary collision
                                // Keep whichever one sorts first in the main 
collator
@@ -338,11 +357,8 @@
                $prev = false;
                $duplicatePrefixes = [];
                foreach ( $letterMap as $key => $value ) {
-                       // Remove terminator byte. Otherwise the prefix
-                       // comparison will get hung up on that.
-                       $trimmedKey = rtrim( $key, "\0" );
                        if ( $prev === false || $prev === '' ) {
-                               $prev = $trimmedKey;
+                               $prev = $key;
                                // We don't yet have a collation element
                                // to compare against, so continue.
                                continue;
@@ -354,14 +370,14 @@
                        // An element "X" will always sort directly
                        // before "XZ" (Unless we have "XY", but we
                        // do not update $prev in that case).
-                       if ( substr( $trimmedKey, 0, strlen( $prev ) ) === 
$prev ) {
+                       if ( substr( $key, 0, strlen( $prev ) ) === $prev ) {
                                $duplicatePrefixes[] = $key;
                                // If this is an expansion, we don't want to
                                // compare the next element to this element,
                                // but to what is currently $prev
                                continue;
                        }
-                       $prev = $trimmedKey;
+                       $prev = $key;
                }
                foreach ( $duplicatePrefixes as $badKey ) {
                        wfDebug( "Removing '{$letterMap[$badKey]}' from first 
letters.\n" );

-- 
To view, visit https://gerrit.wikimedia.org/r/293910
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I241e15985d4f81e2a9c2420dc7301c16e7788512
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: PleaseStand <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to