RE: Issue when updating or inserting non-ASCII characters to VARCHAR UNICODE column

Schroeder, Alexander Wed, 23 Mar 2005 05:54:50 -0800

Hmm ... I found it, so it is definitively not the database,
as it already gets that funny data in the hope it is 
UCS2-encoded ... so we can forget about the vtrace.


In the JDBC trace we see:
...
[EMAIL PROTECTED] (UPDATE WEBUSER.TBLELEMENTDEFAULTS SET SELEMENTVALUE = ?, 
SELEMENTVALUEDESCRIPTION = ? WHERE IID = ?)
=> [EMAIL PROTECTED]
[EMAIL PROTECTED] (1, 
test&ccedil;
)
[EMAIL PROTECTED] (2, testç)

This tells us, that the insertion of &ccedil; happened in the application. You 
have to search upward from the input
to get the one to blame and replace '�' with &ccedil;. May it be that you wrote 
that as 'hidden' field in a form of 
a web page and into it from a correct source but having some 'web-encoding' 
done with it (and it's passed back 
somewhat garbled then)? 

The second one contains 'ç' instead of the cedille, this is are the values C3 
A7, which is the cedille (Unicode 00E7)
in UTF-8 encoding. Somewhere between the source of this string and this point 
in time someone treated the string not
as UTF-8, but as 8-byte ASCII, I fear. Especially as we see these characters in 
the vtrace, that they are correctly
treated as UCS2 characters, by correctly transmitting two bytes per character.

   data PART   (3 arguments, size: 32536)
         buf(4089):
         01005400 65007300 74006900 6E006700 2000C300 A7002000 20002000
              T    e   s    t   i    n   g        �    �

So, the JDBC driver and later the database seem to be culprits of a processing 
error that happened before them.

I hope this helps to find the error.
Regards
Alexander Schr�der
SAP DB, SAP Labs Berlin

> -----Original Message-----
> From: Hellgren, Johan [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, March 23, 2005 2:24 PM
> To: Schroeder, Alexander
> Subject: RE: Issue when updating or inserting non-ASCII 
> characters to VARCHAR UNICODE column
> 
> Alexander,
> 
> On closer inspection the statements are there. However, they 
> don't look exactly the same as the simplified example 
> statements I talked about in my initial mail. I'm sorry for 
> not pointing the differences out.
> 
> The statements in question are "UPDATE 
> WEBUSER.TBLELEMENTDEFAULTS SET SELEMENTVALUE = ?, 
> SELEMENTVALUEDESCRIPTION = ? WHERE IID = ?".
> 
> SELEMENTVALUE is LONG UNICODE, SELEMENTVALUEDESCRIPTION is 
> VARCHAR(200) UNICODE. IID is the primary key. In the 
> statements in the trace I'm using "Testing �" for both 
> SELEMENTVALUE and SELEMENTVALUEDESCRIPTION.
> 
> In vtrace.txt the statement is on line 311, in jdbctrace.txt 
> on line 4462. I'm reattaching both files to this mail, for 
> your convenience.
> 
> Thanks again for your assistance!
> 
> Regards,
> Johan H
> 
> -----Original Message-----
> From: Schroeder, Alexander [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, March 23, 2005 3:11 AM
> To: Hellgren, Johan
> Subject: RE: Issue when updating or inserting non-ASCII 
> characters to VARCHAR UNICODE column
> 
> Sorry, but I cannot find the UPDATE statement in any of the traces 
> you sent me, so possibly you will need to repeat  that step ...
> 
> Both trace contain the statement text as text, so it should
> be possible to grep for it to be sure that's inside.
> 
> Regards
> Alexander Schr�der
> 
> PS: Your driver is has the version 7.4.4  Build 001-000-156-985,
> this is version was done in March 2003, and we fixed a lot of 
> things (including subtle problems with JDK 1.5.0, which you are
> using) since then, so using the new driver can be encouraged.
> 
> > -----Original Message-----
> > From: Hellgren, Johan [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, March 22, 2005 5:19 PM
> > To: Schroeder, Alexander
> > Subject: RE: Issue when updating or inserting non-ASCII 
> > characters to VARCHAR UNICODE column
> > 
> > Alexander,
> > 
> > I'm attaching the vtrace and jdbctrace files. The vtrace 
> > should contain just the statement in question, and the 
> > jdbctrace a bit before that, since I can't start the jdbc 
> > trace after having logged into the application (as starting 
> > the trace restarts the app server). I didn't want to cut 
> > anything in the jbctrace for fear of cutting something 
> > essential. At the end of it, it should trace the same 
> > statement (though executed at a later time, as the vtrace)
> > 
> > I'm not sure of the JDBC driver version used. The file is 
> > dated May 16 2004, and I would assume it's the driver that 
> > was current at that time. I retried the statement with the 
> > most current driver that I had (sapdbc-7_6_00_00_3360.jar), 
> > and the results were the same (works for LONG, gets garbled 
> > for VARCHAR.
> > 
> > Thanks again for your prompt assistance.
> > 
> > Regards,
> > Johan H
> > 
> > -----Original Message-----
> > From: Schroeder, Alexander [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, March 22, 2005 10:39 AM
> > To: Hellgren, Johan; [email protected]
> > Subject: RE: Issue when updating or inserting non-ASCII 
> > characters to VARCHAR UNICODE column
> > 
> > Hello Johan,
> > 
> > I'm quite sure that the JDBC driver is not the primary issue,
> > as you use prepared statements. 
> > 
> > To verify what's really going on, please make a JDBC trace
> > (this will show what's supplied to the JDBC calls by your
> >  application), and a vtrace to see what the database is
> > actually getting from the interface.
> > 
> > Look under http://sapdb.2scale.net/moin.cgi/TroubleShooting 
> > for a quick overview on how to produce these traces. 
> > 
> > Also, can you tell the version of the JDBC driver used, and 
> > can you repeat the test with the most recent one, if you use
> > an older version?
> > 
> > Regards
> > Alexander Schr�der
> > SAP DB, SAP Labs Berlin
> > 
> > > -----Original Message-----
> > > From: Hellgren, Johan [mailto:[EMAIL PROTECTED] 
> > > Sent: Tuesday, March 22, 2005 4:26 PM
> > > To: [email protected]
> > > Subject: Issue when updating or inserting non-ASCII 
> > > characters to VARCHAR UNICODE column
> > > 
> > > Hello list,
> > > 
> > > I am in the process of enabling a content management system 
> > > of ours for international use. Its architecture is an SAPDB 
> > > 7.4.3.27 database instance as the back end, and Resin 2.1.16 
> > > as the application server, using the SAPDB JDBC driver to 
> > > communicate with the database. The database instance is set 
> > > to _UNICODE=YES.
> > > 
> > > The issue that I'm experiencing is that when inserting or 
> > > updating a field using a PreparedStatement of Sun's JDK 
> > > 1.4.2, odd character conversions occur if the target field is 
> > > of type VARCHAR(100) UNICODE, but works better if the field 
> > > is of type LONG UNICODE. A specific example for the � 
> > > (&ccedil;) character is:
> > > 
> > > Let TBLLANGUAGENAME have two columns: ID UNIQUE INTEGER, 
> > > LANGUAGENAME VARCHAR(100) UNICODE
> > > 
> > > Using a prepared statement to execute "UPDATE TBLLANGUAGENAME 
> > > SET LANGUAGENAME=? WHERE IID=?", where ID is an existing 
> > > unique id, and LANGUAGENAME='Fran�ais', fails with the � 
> > > being translated to ç. However, the same statement using SQL 
> > > Studio works, the � gets into the LANGUAGENAME field.
> > > 
> > > If instead the table is defined like this:
> > > 
> > > Let TBLLANGUAGENAME have two columns: ID INTEGER, 
> > > LANGUAGENAME LONG UNICODE,
> > > 
> > > using the same prepared statement with same parameters as 
> > > above works to an extent. Instead of updating a � into the 
> > > LONG field, Fran�ais gets translated to Fran&ccedil;ais, 
> > > which of course also works for html display purposes.
> > > 
> > > What do I need to do to avoid getting non-ASCII character 
> > > mistranslations in VARCHAR fields? Is it even possible, or 
> > > (more likely), am I missing something fundamental?
> > > 
> > > Thanks for your insight!
> > > /Johan Hellgren
> > > 
> > > -- 
> > > MaxDB Discussion Mailing List
> > > For list archives: http://lists.mysql.com/maxdb
> > > To unsubscribe:    
> > > http://lists.mysql.com/[EMAIL PROTECTED]
> > > 
> > > 
> > 
> 

--
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

RE: Issue when updating or inserting non-ASCII characters to VARCHAR UNICODE column

Reply via email to