Thanks for the URL. I can well believe that HTML with Big5 characters would misbehave. The current HTML parser/generator is really only designed for Latin-* HTML. In particular, multi-byte character sets (like Big5) probably only work by accident. If you want Big5, you should probably use text pages, not HTML pages.
When we upgrade the parser to Python 2, we'll have real character set support (as well as access to much faster and generally better XML/HTML parsers), and we should be able to do much better in this regard. Bill