I'm not a python expert, but you can use libiconv to convert the text to utf-8. I use it with C and PHP, it probably has pyhton bindings, and it also has a small app called iconv, which you can pipe to get what you need. if you're not sure what your source encoding will be in all cases, i'd also recommend trying to detect the encoding from the html source, with a regex, and passing the result to iconv as the source encoding.

Lior Kesos wrote:

Hello Gog (Gang of Geeks),
I'm writing a python script that is supposed to get some information
off a hebrew website having this in it's headers...

<META HTTP-EQUIV="Content-Type" content="text/html; charset=windows-1255">

and
<style>
          select{font-family:arial;font-size:13px}
       input{font-family:arial;font-size:13px}
       body{font-family:arial;font-size:13px}
       table{font-family:arial;font-size:13px}
       a{font-family:arial;text-decoration:'underline' ;
color:'#000044';text-decoration: none}
       a:hover{font-family:arial;color:'#FF0033';text-decoration: none}
       a:active{font-family:arial;color:'#660000';
text-decoration:'underline';}
           #pptw{position:absolute;top:-20px;left:0px;}
</style>

I'm connecting through urllib2.urlopen and using htmllib.HTMLParser to parse it.
I get a garbled output when I view the result.
Now I'm pretty weak on the why but I know I want to store my results
in utf8 because it's pretty much what everyone (mysql) uses to be
overpass these types of problems.
It there a way I can cast/transform encodings?

regards

Peace Love and Penguins -
Lior Kesos

===============================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]


=================================================================
To unsubscribe, send mail to [EMAIL PROTECTED] with
the word "unsubscribe" in the message body, e.g., run the command
echo unsubscribe | mail [EMAIL PROTECTED]

Reply via email to