Evan Prodromou wrote:
Karl Fischer wrote:
Hi there
I run a Laconica server, after a migration to a new vps, the Feeds seem broken
here is an example http://floss.pro/kmf/rss,

So, I've only ever seen this before with API calls. Zach, can you remember why those certain letters were getting wiped out?
My theory is it's because the version of PHP on Karl's new VPS was compiled with a version PCRE that doesn't have support for the \p{xx} escape sequences we use to filters out control characters and UTF-16 surrogates (Fedora/RedHat 5's stock PHP 5.1.6 has this problem).

Spaz probably gets its data via JSON instead of XML, so that's why the problem doesn't show up with it. Twhirl probably uses XML.

I guess we need a regex that doesn't depend on unicode properties support in PCRE.

Karl, you could try:

a) installing a version of PHP with a PCRE that supports the unicode properties
b) mucking with the regex in lib/util.php's common_xml_safe_str()

For b) I think '/[\\x00-\\x08\\x0B\\x0C\\x0E-\\x1F]/S' will get illegal control chars... but there are some other chars that need to be stripped, escaped, or replaced (surrogates and possibly some formatting chars).

Zach

--
Zach Copley <[email protected]>
Control Yourself, Inc.

_______________________________________________
Laconica-dev mailing list
[email protected]
http://mail.laconi.ca/mailman/listinfo/laconica-dev

Reply via email to