Awight has uploaded a new change for review.

  https://gerrit.wikimedia.org/r/178972

Change subject: Parse XML charset declaration
......................................................................

Parse XML charset declaration

Now it's possible to consume text containing latin-1 characters.

Change-Id: I5bba92bc195602ada2f5cc0bb71a9e7c0ccd34ed
---
M sites/all/modules/wmf_audit/wmf_audit.module
1 file changed, 6 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.wikimedia.org:29418/wikimedia/fundraising/crm 
refs/changes/72/178972/1

diff --git a/sites/all/modules/wmf_audit/wmf_audit.module 
b/sites/all/modules/wmf_audit/wmf_audit.module
index 02a007b..ecc6cb1 100644
--- a/sites/all/modules/wmf_audit/wmf_audit.module
+++ b/sites/all/modules/wmf_audit/wmf_audit.module
@@ -1006,7 +1006,10 @@
     //look for the raw xml
     $full_xml = false;
     $node = wmfa_execute('get_log_line_xml_outermost_node');
-    $xmlstart = strpos($line, "<$node>");
+    $xmlstart = strpos($line, '<?xml');
+    if ($xmlstart === false) {
+      strpos($line, "<$node>");
+    }
     $xmlend = strpos($line, "</$node>");
     if ($xmlend) {
       $full_xml = true;
@@ -1016,6 +1019,8 @@
       //this is a broken line, and it won't load... but we can still parse 
what's left of the thing, the slow way.
       $xml = substr($line, $xmlstart);
     }
+    // Syslog wart.  Other control characters should be encoded normally.
+    $xml = str_replace( '#012', "\n", $xml );
 
     $donor_data = array();
 

-- 
To view, visit https://gerrit.wikimedia.org/r/178972
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I5bba92bc195602ada2f5cc0bb71a9e7c0ccd34ed
Gerrit-PatchSet: 1
Gerrit-Project: wikimedia/fundraising/crm
Gerrit-Branch: master
Gerrit-Owner: Awight <[email protected]>

_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits

Reply via email to