Physikerwelt has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/66393


Change subject: Remove non UTF-8 chars from debug output
......................................................................

Remove non UTF-8 chars from debug output

The debug output might contain non UTF-8 chars. Theses bad characters
are not compatible with the debug toolbar. As a result the debug
toolbar disappears. Non UTF-8 chars are being produced by the database
log that prints out binary fields for example.

bug 48951

PS1: The following questions should be answered before submitting the patch:
* Does the removal of the chars affect the performance of the system in a 
negative way?
* Are there copyright issues with the regular expression coming from
http://stackoverflow.com/questions/1401317/remove-non-utf8-characters-from-string
* Is there a smarter way to output binary fields e.g. by using the PHP function 
bin2hex?
* I had the impression that the debug toolbar does not even work with all valid 
UTF-8 chars.
I had an example, where I had to remove all non ASCII chars. Later on I could 
not reproduce that behavior.
* Remark: Even if database logging is switched off binary information is 
written to the log by
wfDebug( __METHOD__ . ": Writes done: $sql\n" );
(line 894 Database.php)

Change-Id: I42f7a5c913b378c05b68970646c75894ca068ed9
---
M includes/debug/Debug.php
1 file changed, 23 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/93/66393/1

diff --git a/includes/debug/Debug.php b/includes/debug/Debug.php
index ec9a62a..82a8cfd 100644
--- a/includes/debug/Debug.php
+++ b/includes/debug/Debug.php
@@ -300,6 +300,28 @@
        }
 
        /**
+        * Helper function to remove non UTF-8 chars from a string.
+        * @param string $string the input string
+        * @return string string without non valid UTF-8 chars
+        */
+       private static function removeNonUtf8CharsFromString( $string ){
+               $regex = <<<'END'
+/
+  (
+    (?: [\x00-\x7F]                 # single-byte sequences   0xxxxxxx
+    |   [\xC0-\xDF][\x80-\xBF]      # double-byte sequences   110xxxxx 10xxxxxx
+    |   [\xE0-\xEF][\x80-\xBF]{2}   # triple-byte sequences   1110xxxx 
10xxxxxx * 2
+    |   [\xF0-\xF7][\x80-\xBF]{3}   # quadruple-byte sequence 11110xxx 
10xxxxxx * 3
+    ){1,100}                        # ...one or more times
+  )
+| .                                 # anything else
+/x
+END;
+       
+               return preg_replace($regex, '$1', $string);
+       }
+
+       /**
         * This is a method to pass messages from wfDebug to the pretty 
debugger.
         * Do NOT use this method, use MWDebug::log or wfDebug()
         *
@@ -310,7 +332,7 @@
                global $wgDebugComments, $wgShowDebug;
 
                if ( self::$enabled || $wgDebugComments || $wgShowDebug ) {
-                       self::$debug[] = rtrim( $str );
+                       self::$debug[] = rtrim( 
self::removeNonUtf8CharsFromString( $str ) );
                }
        }
 

-- 
To view, visit https://gerrit.wikimedia.org/r/66393
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I42f7a5c913b378c05b68970646c75894ca068ed9
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/core
Gerrit-Branch: master
Gerrit-Owner: Physikerwelt <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to