Hi Christoph,
Thanks for replying.
> Ok, I start to understand. The copyright symbol is not part of your
> character set gbk (not used that much in China it seems ;-).
We have in the gbk character set the copyright symbol. Surprising? ;-)
This symbol however is double-width and require a Chinese font. That
is perhaps why © and hence the unicode copyright symbol is not
mapped to that symbol.
> The problem
> with this is that you are exploiting a "convenience feature" of modern
> browsers that clashes with the implicit assumption of Kid that one
> output page has only one encoding.
>
I have another opinion here. Since 'gbk' doesn't exclude ascii, it is
quite normal to have the HTML code (don't know whether the Chinese
characters will be correctly displayed on your machine):
<span>© 大管家 <span>
or
<td>大管家</td><td> </td>
(By the way, the unicode for generated by kid also can not be
mapped to 'gbk'.)
The HTML document itself containing such code is definitely of 'gbk'
encoding, not a mixed encoding of unicode and gbk. It doesn't matter
that to render the ©, one browser might choose to use the unicode
symbol. We are dealing with the HTML source here, not the rendering of
it, right?
IMOH as long as the source is valid HTML, we should find a way to
generate it through a templating language. For example, in reading the
HTML source, I want see the Chinese characters as Chinese characters.
One will go mad if all the Chinese characters are escaped in the the
source like:
<span>©2007大管家</span>
> Yes, since if you are using utf8 as output encoding, then you don't need
> ©, you can simply output \xa0. Of course, you can't convert it to
> gbk because then. You just need to output utf8 to the browser, and all
> these problems disappear. That would be a workaround. Are there any
> reasons you are using gbk instead of utf8? (I assume there are, since I
> just checked that not even Google.cn is using utf8).
>
You know unicode is pretty new. Long before that, to use Chinese
characters, we use gb2312 and after that gbk encoding, which use 2
bytes to represent a single Chinese character. Most existing
applications in China, include web applications, still use these 2
encoding.
I happen to have to upload generated web contents to one such site,
which unfortunately only accepts 'gbk' encoding. For our own
application, we would definitely use utf8, which eliminates lots of
encoding related nuisances.
--
Hong Yuan
大管家网上建材超市
装修装潢建材一站式购物
http://www.homemaster.cn
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
kid-template-discuss mailing list
kid-template-discuss@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kid-template-discuss