Evan Prodromou wrote:
Karl Fischer wrote:
Hi there
I run a Laconica server, after a migration to a new vps, the Feeds
seem broken
here is an example http://floss.pro/kmf/rss,
So, I've only ever seen this before with API calls. Zach, can you
remember why those certain letters were getting wiped out?
My theory is it's because the version of PHP on Karl's new VPS was
compiled with a version PCRE that doesn't have support for the \p{xx}
escape sequences we use to filters out control characters and UTF-16
surrogates (Fedora/RedHat 5's stock PHP 5.1.6 has this problem).
Spaz probably gets its data via JSON instead of XML, so that's why the
problem doesn't show up with it. Twhirl probably uses XML.
I guess we need a regex that doesn't depend on unicode properties
support in PCRE.
Karl, you could try:
a) installing a version of PHP with a PCRE that supports the unicode
properties
b) mucking with the regex in lib/util.php's common_xml_safe_str()
For b) I think '/[\\x00-\\x08\\x0B\\x0C\\x0E-\\x1F]/S' will get illegal
control chars... but there are some other chars that need to be
stripped, escaped, or replaced (surrogates and possibly some formatting
chars).
Zach
--
Zach Copley <[email protected]>
Control Yourself, Inc.
_______________________________________________
Laconica-dev mailing list
[email protected]
http://mail.laconi.ca/mailman/listinfo/laconica-dev