if you need to fix a lot of these automatically from a shell script,
you might consider something like this:

python -c 'import sys, urllib; print urllib.unquote("
".join(sys.argv[1:])).decode("utf-8").encode("iso-8859-1")' \
   '%C3%83%C2%A9' \
   '%C3%A4%C2%B8%C2%93%C3%A8%C2%BE%C2%91'

é 专辑

it works like "echo", but decodes the %-escaping and one of the levels
of utf-8 encoding.

On Fri, Oct 31, 2008 at 1:31 PM, Andries E. Brouwer
<[EMAIL PROTECTED]> wrote:
> On Sat, Nov 01, 2008 at 01:51:42AM +0800, Ray Chuan wrote:
>
>> using an edonkey client, which has a function to convert file names to
>> url-friendly strings (aka ed2k links), i was able to see that "é"
>> showed up as %C3%83%C2%A9, while the more complex "专辑"
>> (&#19987;&#36753;) would be %C3%A4%C2%B8%C2%93%C3%A8%C2%BE%C2%91.
>
> You converted twice to UTF-8, so have to go back once.
>
> (é is U+00e9 which is 11000011 10101001 in UTF-8, but if you read
> the latter as Latin-1 and convert once more to UTF-8 you get
> 11000011 10000011 11000010 10101001, that is, %C3%83%C2%A9 as you reported)
>
>
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
>
>

Reply via email to