ArielGlenn has uploaded a new change for review. (
https://gerrit.wikimedia.org/r/362169 )
Change subject: filter out non-compliant characters (bad PCDATA) from revision
text
......................................................................
filter out non-compliant characters (bad PCDATA) from revision text
Illegal characters in revision text in the database get replaced with
space when generating the xml dump file, so that an xml reader won't
break on them.
Bug: T167456
Change-Id: If9770ccd0eccc9aa6304f259dbba5cf3b6272c38
---
M includes/Dump/Exporter.php
1 file changed, 2 insertions(+), 0 deletions(-)
git pull ssh://gerrit.wikimedia.org:29418/mediawiki/extensions/Flow
refs/changes/69/362169/1
diff --git a/includes/Dump/Exporter.php b/includes/Dump/Exporter.php
index 0f2bfd0..323b308 100644
--- a/includes/Dump/Exporter.php
+++ b/includes/Dump/Exporter.php
@@ -418,6 +418,8 @@
$attribs,
$revision->getContent( $format )
) . "\n";
+ // filter out bad characters that may have crept into old
revisions
+ $output = preg_replace(
'/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ',
$output );
$this->sink->write( $output );
}
--
To view, visit https://gerrit.wikimedia.org/r/362169
To unsubscribe, visit https://gerrit.wikimedia.org/r/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: If9770ccd0eccc9aa6304f259dbba5cf3b6272c38
Gerrit-PatchSet: 1
Gerrit-Project: mediawiki/extensions/Flow
Gerrit-Branch: master
Gerrit-Owner: ArielGlenn <[email protected]>
_______________________________________________
MediaWiki-commits mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-commits