Barley <[EMAIL PROTECTED]> wrote on 17/09/2004 15:17:11:
> Say, for example, I want to run an insert like the following:
>
> java.sql.Statement select = conn.createStatement();
> select.executeUpdate("update test set observerNote='\u201C ...
\u00BC'");
>
> FWIW, u201C is an opening curly quote and u00BC is a fraction
representing
> one quarter.
>
> If I create my JDBC url like this:
>
>
jdbc:mysql://localhost/test?user=test&password=test&useUnicode=true&characte
> rEncoding=cp1250
>
> then the curly quote is successfully inserted, but not the 'one quarter'
> symbol. However, if I create the url in this way:
>
>
jdbc:mysql://localhost/test?user=test&password=test&useUnicode=true&characte
> rEncoding=latin1
>
> then the 'one quarter' is inserted but not the curly quotes. I
understand
> that the latin1 character set includes the 'one quarter' symbol, but not
the
> curly quote and that the cp1250 character set includes the curly quote
but
> not the 'one quarter' symbol, but I want a way where I don't have to
choose
> a single limited pool of characters.
>
> How can I insert a String that contains both characters? Isn't there a
way
> to enable JDBC/MySql ConnectorJ to be able to insert Strings containing
any
> combination of Unicode characters?
>
> Many thank to anyone who can clarify this issue.
This answer is stretching my knowledge of character sets, but may help you
- and if someone corrects me, will help me too.
Latin1 and cp1250 (which seems to be the same as latin2) are both 8-bit
character sets. By selecting them, you are telling MySQL to map down from
the 16-bit Unicode set to one of two different, and incompatible, 8-bit
character sets, then to map back up again on retrieval. When it maps down
from Unicode to latinX, characters which have no mapping in that character
set are, I think, converted to the standard "unknown character" symbol,
and thus lost. What you actually want is true 16-bit storage, and for
this you need to specify a true 16-bit character set. As I understand it,
there are two such character sets: UTF-8 and UCS-2. Either of those will
store both your extended characters. Which you use depencds on your exact
needs. If you are largely storing latin text with a few funny characters,
you probably want utf-8. If you are laregely storing non-latin characters,
you probably want UCS-2.
If you have not already done so, I suggest you study the manual page on
the difference between Character sets and Collations. It is not simple,
but it is very logical, and when you understand it, it makes this sort of
pr0blem much easier.
If you are only using Java, it is much the easiest to stick to one of the
two 16-bit character sets and just change collation if you need to. If you
need to mix Java with 8-bit languages such as C/C++, it gets more
complicated.
Alec
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]