Thanks for the URL.

I can well believe that HTML with Big5 characters would misbehave.
The current HTML parser/generator is really only designed for Latin-*
HTML.  In particular, multi-byte character sets (like Big5) probably
only work by accident.  If you want Big5, you should probably use text
pages, not HTML pages.

When we upgrade the parser to Python 2, we'll have real character set
support (as well as access to much faster and generally better
XML/HTML parsers), and we should be able to do much better in this
regard.

Bill

Reply via email to