The Java solution is working, but it's kind of slow. I thought I'd
give a try to what several of you suggested, namely using Tcl to do
the conversion instead. Of course I've run into problems here too...
nothing could be easy about this! :)
To recap, I'm currently using a translator written in Java, from
mandarintools.com. My servlet requests a page from the Traditional
Chinese site, setting the charset to UTF-8. It then uses the converter
to translate it from UTF-8 to UTF-8S, which is a version of Simplified
Chinese that's apparently somewhat obscure, but gives the right
results. It is then written out to the client with the charset once
again set to UTF-8.
All of my attempts to recreate this in Tcl have resulted in garbage.
I started out assuming that my incoming data from ns_httpget will be
in UTF-8, since the Traditional site is using it and Tcl strings
default to that encoding. So I tried
set page_body [ns_httpget "http://big5.hrichina.org"]
set translated_page_body [encoding convertto gb2312 $page_body]
ns_write $translated_page_body
The outgoing charset is also set to UTF-8, via the old Arsdigita
ReturnHeaders proc. But this results in garbage.
After messing with this for a while I decided to make sure I could
read the page in and spit it back out without error. Nope.
"encoding system" told me that the system encoding is iso8859-1, which
seems correct. I've tried all combinations of converting from this,
or not, and converting to utf-8, or not, and get garbage no matter
what. I've also tried using "encoding system" to set Tcl's encoding
to utf-8, but still no joy.
Any suggestions?
thanks,
janine
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject:
field of your email blank.