as a workaround, I wrote a logicsheet to convert wrong Strings into right one. So, I call it everytime when access the String columns. Apparently, it slows down the processing, and I have large amounts of data for reporting. Megabytes, actually.
> -----Original Message----- > From: Argyn Kuketayev [mailto:[EMAIL PROTECTED]] > Sent: Friday, June 07, 2002 6:14 PM > To: Cocoon-Users (E-mail) > Subject: C2.0.1 ESQL/XSP + UTF-8 encoded Japanese characters in Oracle > > > Here's my problem: > > I use esql tags inside XSP to generate XML from the Oracle > database with > UTF-8 encoding. English characters work fine. > > Then comes the issue with Japanese characters: > 1. every Chinese or Japanese character is encoded in 3 bytes > in UTF-8, and > stored in the varchar2 column. > 2. when I use esql logicsheets to make XML file from Oracle, > the XSP is > converted into Java file. inside Java I can see that it uses > getString() > method of ReslutSet object. > Unfortunately, getString() returns a String object with three > characters per > every chinese character. What happens is that Oracle jdbc > driver makes one > character for every byte in the database. So, the character > has empty higher > byte, and lower byte is one of the bytes of UTF-8 > representation of chinese > character. > Then Cocoon gets the incorrect String, it puts &#NMUBER; for every > character, so XML has totally wrong strings. > > I couldn't make Oracle to return correct String > representation of data in > the database. Changing the regional settings is not the option. > > Now, our Web application (without cocoon) works fine! How come? > I'll explain: > getString() returns WRONG string with three characters per > one chinese. Then > JSP page uses out.print() method. This method thinks "I'm > English, I see > three characters, I have to convert them into English. I'll > simply cut upper > bytes". So, print throws just three bytes, and they are RIGHT > bytes! Then > browser sees that the page is UTF-8 encoded, takes three > bytes, and shows > them as correct ONE chinese character. > > The question is: what shall I do? > > 1. If I somehow make getString() to return me correct String, > then seemingly > my JSPs will break - they will try to print correct character > by cutting the > upper byte. > > 2. If I change Cocoon to use something similar to out.print() > from JSP, then > it may break when somebody changes the regional settings (?). > > Argyn > > > --------------------------------------------------------------------- > Please check that your question has not already been answered in the > FAQ before posting. <http://xml.apache.org/cocoon/faqs.html> > > To unsubscribe, e-mail: <[EMAIL PROTECTED]> > For additional commands, e-mail: <[EMAIL PROTECTED]> > --------------------------------------------------------------------- Please check that your question has not already been answered in the FAQ before posting. <http://xml.apache.org/cocoon/faqs.html> To unsubscribe, e-mail: <[EMAIL PROTECTED]> For additional commands, e-mail: <[EMAIL PROTECTED]>