The Java solution is working, but it's kind of slow. I thought I'd give a try to what several of you suggested, namely using Tcl to do the conversion instead. Of course I've run into problems here too... nothing could be easy about this! :)

To recap, I'm currently using a translator written in Java, from mandarintools.com. My servlet requests a page from the Traditional Chinese site, setting the charset to UTF-8. It then uses the converter to translate it from UTF-8 to UTF-8S, which is a version of Simplified Chinese that's apparently somewhat obscure, but gives the right results. It is then written out to the client with the charset once again set to UTF-8.

All of my attempts to recreate this in Tcl have resulted in garbage. I started out assuming that my incoming data from ns_httpget will be in UTF-8, since the Traditional site is using it and Tcl strings default to that encoding. So I tried

set page_body [ns_httpget "http://big5.hrichina.org";]
set translated_page_body [encoding convertto gb2312 $page_body]
ns_write $translated_page_body

The outgoing charset is also set to UTF-8, via the old Arsdigita ReturnHeaders proc. But this results in garbage.

After messing with this for a while I decided to make sure I could read the page in and spit it back out without error. Nope. "encoding system" told me that the system encoding is iso8859-1, which seems correct. I've tried all combinations of converting from this, or not, and converting to utf-8, or not, and get garbage no matter what. I've also tried using "encoding system" to set Tcl's encoding to utf-8, but still no joy.

Any suggestions?

thanks,

janine



--
AOLserver - http://www.aolserver.com/

To Remove yourself from this list, simply send an email to 
<[email protected]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: 
field of your email blank.

Reply via email to