jenkins-bot has submitted this change and it was merged. (
https://gerrit.wikimedia.org/r/362169 )
Change subject: Dumps: filter out non-compliant characters (bad PCDATA) from
revision text
......................................................................
Dumps: filter out non-compliant characters (bad PCDATA) from revision text
Illegal characters in revision text in the database get replaced with
space when generating the xml dump file, so that an xml reader won't
break on them.
Bug: T167456
Change-Id: If9770ccd0eccc9aa6304f259dbba5cf3b6272c38
---
M includes/Dump/Exporter.php
1 file changed, 2 insertions(+), 0 deletions(-)
Approvals:
Catrope: Looks good to me, approved
jenkins-bot: Verified
diff --git a/includes/Dump/Exporter.php b/includes/Dump/Exporter.php
index 0f2bfd0..323b308 100644
--- a/includes/Dump/Exporter.php
+++ b/includes/Dump/Exporter.php
@@ -418,6 +418,8 @@
$attribs,
$revision->getContent( $format )
) . "\n";
+ // filter out bad characters that may have crept into old
revisions
+ $output = preg_replace(
'/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ',
$output );
$this->sink->write( $output );
}
--
To view, visit https://gerrit.wikimedia.org/r/362169
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: If9770ccd0eccc9aa6304f259dbba5cf3b6272c38
Gerrit-PatchSet: 2
Gerrit-Project: mediawiki/extensions/Flow
Gerrit-Branch: master
Gerrit-Owner: ArielGlenn <[email protected]>
Gerrit-Reviewer: Catrope <[email protected]>
Gerrit-Reviewer: jenkins-bot <>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits