jenkins-bot has submitted this change and it was merged.
Change subject: Remove reduntant regex from calls to StringUtils::isUtf8()
......................................................................
Remove reduntant regex from calls to StringUtils::isUtf8()
I've cautiously moved the regex out of the most used code path.
There is no string that will match that regex check that will not also be
passed by mb_check_encoding. I think the regex was intended as a shortcut
evaluation, but it is no faster than mb_check_encoding which will often
need to be run anyway.
I think it could just be deleted, but I have limited motivation to
risk introducing a bug to improve performance on old PHP vesions and
unusual configurations, so I've moved it to the fallback code path.
Change-Id: Ie9425cc23ba032e5aff42beeb44cbb1146050452
---
M includes/StringUtils.php
1 file changed, 5 insertions(+), 4 deletions(-)
Approvals:
Reedy: Looks good to me, approved
jenkins-bot: Verified
diff --git a/includes/StringUtils.php b/includes/StringUtils.php
index 9e21d03..c1545e6 100644
--- a/includes/StringUtils.php
+++ b/includes/StringUtils.php
@@ -51,10 +51,6 @@
*/
static function isUtf8( $value, $disableMbstring = false ) {
$value = (string)$value;
- if ( preg_match( "/[\x80-\xff]/S", $value ) === 0 ) {
- // String contains only ASCII characters, has to be
valid
- return true;
- }
// If the mbstring extension is loaded, use it. However, before
PHP 5.4, values above
// U+10FFFF are incorrectly allowed, so we have to check for
them separately.
@@ -68,6 +64,11 @@
( $newPHP || preg_match(
"/\xf4[\x90-\xbf]|[\xf5-\xff]/S", $value ) === 0 );
}
+ if ( preg_match( "/[\x80-\xff]/S", $value ) === 0 ) {
+ // String contains only ASCII characters, has to be
valid
+ return true;
+ }
+
// PCRE implements repetition using recursion; to avoid a stack
overflow (and segfault)
// for large input, we check for invalid sequences (<= 5 bytes)
rather than valid
// sequences, which can be as long as the input string is.
Multiple short regexes are
--
To view, visit https://gerrit.wikimedia.org/r/62192
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie9425cc23ba032e5aff42beeb44cbb1146050452
Gerrit-PatchSet: 3
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Lwelling <[email protected]>
Gerrit-Reviewer: Reedy <[email protected]>
Gerrit-Reviewer: jenkins-bot
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits