Brad Jorsch schreef:
> On Mon, Mar 23, 2009 at 09:31:47AM +0800, [email protected] wrote:
>> Just curious, are formats json, jsonfm, and rawfm not supposed to pass
>> UTF-8 unescaped, and format wddx supposed to turn it into some other
>> format?
> 
> The PHP json_encode function and the fallback code (in case PHP's
> json_encode is unavailable or buggy) both automatically escape non-ASCII
> unicode characters. This does make the response slightly larger, but
> doesn't hurt anything and can help clients with b0rken utf8 support.
> 
> PHP's wddx_serialize_value seems to have a bug in some 5.2.x and 5.3.x
> versions of PHP: it treats the input as iso-8859-1 and converts it to
> utf8, so actual utf8 input gets "double"-utf8-encoded.[1] See
> http://bugs.php.net/bug.php?id=45314 for more info. I guess a bugcheck
> (like the one we added for JSON and PHP bug 46944) would be in order for
> the WDDX formatter.
> 
Added one in r48713.

> Speaking of which, I just noticed that our JSON bugcheck is wrong: the
> correct output is '"\ud840\udc00"', not '\ud840\udc00'.
> 
Yeah, that's stupid, good catch. Fortunately, this typo didn't cause any 
misformatting because it never used PHP's formatter.

> [1] For example, the character U+00A0 is represented by the two bytes
>     c2 a0 in utf8. The buggy wddx_serialize_value interprets that as two
>     iso-8859-1 characters, and converts each to utf8: c3 82 c2 a0
>
I used this exact example to test the WDDX formatter.

Roan Kattouw (Catrope)


_______________________________________________
Mediawiki-api mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-api

Reply via email to