PleaseStand has uploaded a new change for review.
https://gerrit.wikimedia.org/r/293910
Change subject: IcuCollation: Remove null terminator from sort key if present
......................................................................
IcuCollation: Remove null terminator from sort key if present
In PHP 5.3.15 and 5.4.5, there was a change to Collation::getSortKey().
Now, its return value now does not end with a null byte. A corresponding
change to HHVM was not made until May 2016. Thus, sort keys depended not
just on ICU library version but also on PHP/HHVM version (or PECL intl
version), which is undesirable.
* In getSortKey() and getPrimarySortKey(), trim off any null terminator.
* In the return value of fetchFirstLetterData(), use trimmed keys rather
than untrimmed keys. Because each trimmed key sorts immediately above
the corresponding untrimmed key, this should be fine; getFirstLetter()
should still work as expected.
* Bump FIRST_LETTER_VERSION accordingly. Though the sort key format may
mean this is not strictly necessary, it is implementation-defined,
aside from the restriction that only the terminator is a null byte.
* As sort keys will change on some Wikimedia sites, add a migration flag
to allow scheduling the necessary database updates separately from
the next MediaWiki update.
Bug: T137642
Change-Id: I241e15985d4f81e2a9c2420dc7301c16e7788512
---
M RELEASE-NOTES-1.27
M includes/DefaultSettings.php
M includes/collation/IcuCollation.php
3 files changed, 50 insertions(+), 9 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core
refs/changes/10/293910/1
diff --git a/RELEASE-NOTES-1.27 b/RELEASE-NOTES-1.27
index c13be51..c7b006c 100644
--- a/RELEASE-NOTES-1.27
+++ b/RELEASE-NOTES-1.27
@@ -141,6 +141,10 @@
those via the web UI. Use UserLoggedIn if you need to do something on all
logins.
** $wgRequirePasswordforEmailChange is removed.
+* (T137642) If $wgCategoryCollation is set to a value other than "uppercase"
+ or "identity", and your site ran a PHP version older than 5.3.15 or 5.4.5,
+ or any version of HHVM, you should run maintenance/updateCollation.php
+ (with the --force option) after upgrading.
=== New features in 1.27 ===
* $wgDataCenterUpdateStickTTL was also added. This decides how long a user
diff --git a/includes/DefaultSettings.php b/includes/DefaultSettings.php
index 2607797..16c804c 100644
--- a/includes/DefaultSettings.php
+++ b/includes/DefaultSettings.php
@@ -7328,6 +7328,27 @@
*/
$wgCategoryCollation = 'uppercase';
+/**
+ * If the wiki uses a UCA collation, whether to append a null byte to each
+ * sort key.
+ *
+ * Because of a bug, some old versions of PHP's intl extension (including
+ * some versions of HHVM) return the sort key's null terminator as the
+ * last character of the string. Until 1.27, MediaWiki did not strip off
+ * that null byte, which made the sort key not merely dependent on ICU
+ * version, but also on PHP extension version. This has changed.
+ *
+ * This setting is a migration flag, intended for use on large wiki farms,
+ * to allow administrators to schedule the necessary database updates
+ * independently of each other and of MediaWiki updates. It will be
+ * removed in the next MediaWiki version.
+ *
+ * @deprecated since 1.27
+ * @since 1.27
+ * @see https://phabricator.wikimedia.org/T137642
+ */
+$wgAppendNullToUcaSortKeys = false;
+
/** @} */ # End categories }
/*************************************************************************//**
diff --git a/includes/collation/IcuCollation.php
b/includes/collation/IcuCollation.php
index 27f917b..bfaf960 100644
--- a/includes/collation/IcuCollation.php
+++ b/includes/collation/IcuCollation.php
@@ -22,7 +22,7 @@
* @since 1.16.3
*/
class IcuCollation extends Collation {
- const FIRST_LETTER_VERSION = 2;
+ const FIRST_LETTER_VERSION = 3;
/** @var Collator */
private $primaryCollator;
@@ -193,11 +193,27 @@
}
public function getSortKey( $string ) {
- return $this->mainCollator->getSortKey( $string );
+ global $wgAppendNullToUcaSortKeys;
+
+ // Remove the null terminator byte if one is present
+ // https://github.com/facebook/hhvm/issues/7106
+ $sortKey = rtrim( $this->mainCollator->getSortKey( $string ),
"\0" );
+ if ( $wgAppendNullToUcaSortKeys ) {
+ $sortKey .= "\0";
+ }
+ return $sortKey;
}
public function getPrimarySortKey( $string ) {
- return $this->primaryCollator->getSortKey( $string );
+ global $wgAppendNullToUcaSortKeys;
+
+ // Remove the null terminator byte if one is present
+ // https://github.com/facebook/hhvm/issues/7106
+ $sortKey = rtrim( $this->primaryCollator->getSortKey( $string
), "\0" );
+ if ( $wgAppendNullToUcaSortKeys ) {
+ $sortKey .= "\0";
+ }
+ return $sortKey;
}
public function getFirstLetter( $string ) {
@@ -289,6 +305,9 @@
$letterMap = [];
foreach ( $letters as $letter ) {
$key = $this->getPrimarySortKey( $letter );
+ if ( $wgAppendNullToUcaSortKeys ) {
+ $key = rtrim( $key, "\0" );
+ }
if ( isset( $letterMap[$key] ) ) {
// Primary collision
// Keep whichever one sorts first in the main
collator
@@ -338,11 +357,8 @@
$prev = false;
$duplicatePrefixes = [];
foreach ( $letterMap as $key => $value ) {
- // Remove terminator byte. Otherwise the prefix
- // comparison will get hung up on that.
- $trimmedKey = rtrim( $key, "\0" );
if ( $prev === false || $prev === '' ) {
- $prev = $trimmedKey;
+ $prev = $key;
// We don't yet have a collation element
// to compare against, so continue.
continue;
@@ -354,14 +370,14 @@
// An element "X" will always sort directly
// before "XZ" (Unless we have "XY", but we
// do not update $prev in that case).
- if ( substr( $trimmedKey, 0, strlen( $prev ) ) ===
$prev ) {
+ if ( substr( $key, 0, strlen( $prev ) ) === $prev ) {
$duplicatePrefixes[] = $key;
// If this is an expansion, we don't want to
// compare the next element to this element,
// but to what is currently $prev
continue;
}
- $prev = $trimmedKey;
+ $prev = $key;
}
foreach ( $duplicatePrefixes as $badKey ) {
wfDebug( "Removing '{$letterMap[$badKey]}' from first
letters.\n" );
--
To view, visit https://gerrit.wikimedia.org/r/293910
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I241e15985d4f81e2a9c2420dc7301c16e7788512
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: PleaseStand <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits