Re: JDBC - how to insert Strings with mixed characterEncodings

Alec . Cawley Fri, 17 Sep 2004 08:24:49 -0700

Barley <[EMAIL PROTECTED]> wrote on 17/09/2004 15:17:11:

> Say, for example, I want to run an insert like the following:
> 
> java.sql.Statement select = conn.createStatement();
> select.executeUpdate("update test set observerNote='\u201C ... 
\u00BC'");
> 
> FWIW, u201C is an opening curly quote and u00BC is a fraction 
representing
> one quarter.
> 
> If I create my JDBC url like this:
> 
> 
jdbc:mysql://localhost/test?user=test&password=test&useUnicode=true&characte
> rEncoding=cp1250
> 
> then the curly quote is successfully inserted, but not the 'one quarter'
> symbol. However, if I create the url in this way:
> 
> 
jdbc:mysql://localhost/test?user=test&password=test&useUnicode=true&characte
> rEncoding=latin1
> 
> then the 'one quarter' is inserted but not the curly quotes. I 
understand
> that the latin1 character set includes the 'one quarter' symbol, but not 
the
> curly quote and that the cp1250 character set includes the curly quote 
but
> not the 'one quarter' symbol, but I want a way where I don't have to 
choose
> a single limited pool of characters.
> 
> How can I insert a String that contains both characters? Isn't there a 
way
> to enable JDBC/MySql ConnectorJ to be able to insert Strings containing 
any
> combination of Unicode characters?
> 
> Many thank to anyone who can clarify this issue.


This answer is stretching my knowledge of character sets, but may help you 
- and if someone corrects me, will help me too.

Latin1 and cp1250 (which seems to be the same as latin2) are both 8-bit 
character sets. By selecting them, you are telling MySQL to map down from 
the 16-bit Unicode set to one of two different, and incompatible, 8-bit 
character sets, then to map back up again on retrieval. When it maps down 
from Unicode to latinX, characters which have no mapping in that character 
set are, I think, converted to the standard "unknown character" symbol, 
and thus lost.  What you actually want is true 16-bit storage, and for 
this you need to specify a true 16-bit character set. As I understand it, 
there are two such character sets: UTF-8 and UCS-2. Either of those will 
store both your extended characters. Which you use depencds on your exact 
needs. If you are largely storing latin text with a few funny characters, 
you probably want utf-8. If you are laregely storing non-latin characters, 
you probably want UCS-2.

If you have not already done so, I suggest you study the manual page on 
the difference between Character sets and Collations. It is not simple, 
but it is very logical, and when you understand it, it makes this sort of 
pr0blem much easier.

If you are only using Java, it is much the easiest to stick to one of the 
two 16-bit character sets and just change collation if you need to. If you 
need to mix Java with 8-bit languages such as C/C++, it gets more 
complicated.

        Alec


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Re: JDBC - how to insert Strings with mixed characterEncodings

Reply via email to