On 05/07/05, Dvir Volk <[EMAIL PROTECTED]> wrote:
> I'm not a python expert, but you can use libiconv to convert the text to
> utf-8. I use it with C and PHP, it probably has pyhton bindings, and it
> also has a small app called iconv, which you can pipe to get what you need.
> if you're not sure what your source encoding will be in all cases, i'd
> also recommend trying to detect the encoding from the html source, with
> a regex, and passing the result to iconv as the source encoding.
Python has its own conversion routines, and an internal Unicode
representation. The way to go is to use the decode() string method to
convert the page to the internal unicode representation, and then
render that representation in the encoding of your choice using
encode(). For instance:
s='Hebrew cp-1255 text שלום'
u8=s.decode('cp-1255').encode('utf-8')
-- Arik