I just committed a workaround to HTML Purifier master (unreleased), we think 
this
is an upstream php/glibc bug.

Edward

Excerpts from Jörg Ludwig's message of Sat Jun 25 13:24:40 -0400 2011:
> Package: php-htmlpurifier
> Version: 4.3.0+dfsg1-1
> Severity: important
>
> We use HTML Purifier to clean up HTML mails from customers before displaying
> then. Under certain circumstances an ISO-8859-1 HTML string is cut off in the
> middle. The following scripts reproduces the problem:
>
>
> require_once "HTMLPurifier.auto.php";
>
> $in = "€".str_repeat(".", 50000);
>
> $cfg = HTMLPurifier_Config::createDefault();
> $cfg->set("Core.Encoding", "iso-8859-1");
> $purifier = new HTMLPurifier($cfg);
> $out = $purifier->purify($in);
>
> echo "in: ".strlen($in)."<br>";
> echo "out: ".strlen($out)."<br>";
> echo $out;
>
>
> Output:
> in: 50007
> out: 8159
> ................... [...]
>
>
> Expected Output:
> in: 50007
> out: 50007
> [Euro symbol]............ [...]
>
>
> The problem does not occur with encoding set to UTF-8. Unfortunately we cannot
> just convert the encoding as the encoding is also declared in the HTML header
> of the input string.
>
>
> -- System Information:
> Debian Release: 6.0.1
>   APT prefers stable-updates
>   APT policy: (500, 'stable-updates'), (500, 'stable')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 2.6.26-2-xen-amd64 (SMP w/2 CPU cores)
> Locale: LANG=de_DE.UTF-8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
>
> Versions of packages php-htmlpurifier depends on:
> ii  php5                    5.3.3-7+squeeze1 server-side, HTML-embedded 
> scripti
>
> Versions of packages php-htmlpurifier recommends:
> ii  php5-cli                5.3.3-7+squeeze1 command-line interpreter for the 
> p
>
> php-htmlpurifier suggests no packages.
>
> -- no debconf information



--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to