Yuan HOng wrote:
> Unfortunately they are not completely equivalent :-(
> 
> I have to use the encoding 'gbk' for the output, which doesn't have a
> corresponding character for \xa9. Trying to convert will raise an
> exception. Using encoding 'gbk' in the serialize function will
> truncate the simple and all Chinese characters after '2007'.

Ok, I start to understand. The copyright symbol is not part of your 
character set gbk (not used that much in China it seems ;-). It is only 
part of latin-1 (and this unicode). But if you write © then the 
browser will display it as unicode anyway. So it is a way for you to use 
unicode characters although the document is actually not using the 
unicode charset. This will probably work with all modern browsern, the 
browser will try to switch between the fonts automatically. The problem 
with this is that you are exploiting a "convenience feature" of modern 
browsers that clashes with the implicit assumption of Kid that one 
output page has only one encoding.

I need to think about this a little more; maybe in the next version I 
will add an optional feature to Kid's HTML serializer to use HTML 
entities for all unicode characters that are not part of the output 
encoding, instead of simply raising an exception.

> I tried using encoding='utf8' and then convert the result to gbk. But
> with 'utf8' encoding the format='named' argument doesn't seem to be
> working and I got \xa0 for  , which also is not convertible to
> 'gbk'

Yes, since if you are using utf8 as output encoding, then you don't need 
©, you can simply output \xa0. Of course, you can't convert it to 
gbk because then. You just need to output utf8 to the browser, and all 
these problems disappear. That would be a workaround. Are there any 
reasons you are using gbk instead of utf8? (I assume there are, since I 
just checked that not even Google.cn is using utf8).

 > And another question is why are © and & handled differently?

They are different in that & is ascii, while © is not, and also 
the former is a special charater in HTML/XML that needs to be escaped, 
while the latter has no special meaning in XML.

-- Christoph

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kid-template-discuss mailing list
kid-template-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kid-template-discuss

Reply via email to