Physikerwelt has uploaded a new change for review. https://gerrit.wikimedia.org/r/66393
Change subject: Remove non UTF-8 chars from debug output ...................................................................... Remove non UTF-8 chars from debug output The debug output might contain non UTF-8 chars. Theses bad characters are not compatible with the debug toolbar. As a result the debug toolbar disappears. Non UTF-8 chars are being produced by the database log that prints out binary fields for example. bug 48951 PS1: The following questions should be answered before submitting the patch: * Does the removal of the chars affect the performance of the system in a negative way? * Are there copyright issues with the regular expression coming from http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string * Is there a smarter way to output binary fields e.g. by using the PHP function bin2hex? * I had the impression that the debug toolbar does not even work with all valid UTF-8 chars. I had an example, where I had to remove all non ASCII chars. Later on I could not reproduce that behavior. * Remark: Even if database logging is switched off binary information is written to the log by wfDebug( __METHOD__ . ": Writes done: $sql\n" ); (line 894 Database.php) Change-Id: I42f7a5c913b378c05b68970646c75894ca068ed9 --- M includes/debug/Debug.php 1 file changed, 23 insertions(+), 1 deletion(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core refs/changes/93/66393/1 diff --git a/includes/debug/Debug.php b/includes/debug/Debug.php index ec9a62a..82a8cfd 100644 --- a/includes/debug/Debug.php +++ b/includes/debug/Debug.php @@ -300,6 +300,28 @@ } /** + * Helper function to remove non UTF-8 chars from a string. + * @param string $string the input string + * @return string string without non valid UTF-8 chars + */ + private static function removeNonUtf8CharsFromString( $string ){ + $regex = <<<'END' +/ + ( + (?: [\x00-\x7F] # single-byte sequences 0xxxxxxx + | [\xC0-\xDF][\x80-\xBF] # double-byte sequences 110xxxxx 10xxxxxx + | [\xE0-\xEF][\x80-\xBF]{2} # triple-byte sequences 1110xxxx 10xxxxxx * 2 + | [\xF0-\xF7][\x80-\xBF]{3} # quadruple-byte sequence 11110xxx 10xxxxxx * 3 + ){1,100} # ...one or more times + ) +| . # anything else +/x +END; + + return preg_replace($regex, '$1', $string); + } + + /** * This is a method to pass messages from wfDebug to the pretty debugger. * Do NOT use this method, use MWDebug::log or wfDebug() * @@ -310,7 +332,7 @@ global $wgDebugComments, $wgShowDebug; if ( self::$enabled || $wgDebugComments || $wgShowDebug ) { - self::$debug[] = rtrim( $str ); + self::$debug[] = rtrim( self::removeNonUtf8CharsFromString( $str ) ); } } -- To view, visit https://gerrit.wikimedia.org/r/66393 To unsubscribe, visit https://gerrit.wikimedia.org/r/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I42f7a5c913b378c05b68970646c75894ca068ed9 Gerrit-PatchSet: 1 Gerrit-Project: mediawiki/core Gerrit-Branch: master Gerrit-Owner: Physikerwelt <[email protected]> _______________________________________________ MediaWiki-commits mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits
