Hi Rick,
I have only studied the SQL 1992 standard concerning character sets,
hope my understanding is still valid (if it ever was).
Both the CHAR and the NCHAR data types are actually the same data type
CHAR (or CHARACTER), but made up of characters from different character
sets. Each database has in effect two default character sets, the one
used for CHAR and the one used for NCHAR. But you may also specify an
explicit character set for a column as in NAME CHARACTER(100) CHARACTER
SET UTF8. The character set used for CHAR can also be overridden per schema.
Thus, when you create a database, you should be able to specify that the
default character set for CHAR columns be ASCII, and the character set
used for NCHAR be UTF8.
Note also that according to the SQL standard, values of type CHAR but
with different character sets are not generally comparable.
Each character set will also have a default collation. In a database
with full SQL support for character sets and collations, you might use
this to say that both CHAR and NCHAR store UTF16 characters, but that
CHAR has a binary collation and NCHAR has a French collation.
SQL will also allow you to override a collation specification e.g. on an
ORDER BY statement, and though not specified by the SQL standard, you
might be able to create an index using a national ordering sequence.
Cheers,
Roy
Rick Hillegas (JIRA) wrote:
[ http://issues.apache.org/jira/browse/DERBY-533?page=comments#action_12319919 ]
Rick Hillegas commented on DERBY-533:
-------------------------------------
1) There are some interesting issues here. Let's say that we re-enable these
datatypes in 10.2. What happens when a client application selects from an NCHAR
column under the following combinations? What should the ResultSetMetaData say
the column is? Is the following reasonable?
| NETWORK CLIENT | CLIENT PLATFORM | RESULT TYPE |
|-----------------------------|-----------------------------|----------------------|
| Derby 10.2 | jdk1.4 | NCHAR
|
|-----------------------------|-----------------------------|----------------------|
| Derby 10.2 | jdk1.6 | NCHAR
|
|-----------------------------|-----------------------------|----------------------|
| Derby 10.1 | jdk1.4 | CHAR
|
|-----------------------------|-----------------------------|----------------------|
| Derby 10.1 | jdk1.6 | CHAR
|
|-----------------------------|-----------------------------|----------------------|
| db2jcc | jdk1.4 | CHAR
|
|-----------------------------|-----------------------------|----------------------|
| db2jcc | jdk1.6 | CHAR
|
|-----------------------------|-----------------------------|----------------------|
Since all of our string datatypes are represented as unicode, I think it is ok,
as necessary, to implicitly cast CHAR to NCHAR going from client to server.
I also think it is reasonable to raise an exception if someone runs a 10.1
server against a 10.2 database.
2) I don't see where the SQL standard addresses coercion between national strings and
other types. Part 2 section 4.2.1 says that NATIONAL CHARACTER is "implementation
defined". Part 2 section 6.12 lists legal and forbidden CASTS but says nothing about
national string types. As always, I welcome being educated about what else might be
relevant in the spec.
Oracle supports the following coercions but not the inverse coercions and
Oracle documentation does not address localization issues:
Datetime/Interval -> NCHAR/NVARCHAR2
Number -> NCHAR/NVARCHAR2
MySQL does not advertise any ability to cast to/from national strings.
DB2 and Postgres do not support national strings.
In summary, I do not see much guidance here. Derby's previous behavior seems
reasonable to me: applying localization when coercing between national strings
and other types.
Re-enable national character datatypes
--------------------------------------
Key: DERBY-533
URL: http://issues.apache.org/jira/browse/DERBY-533
Project: Derby
Type: New Feature
Components: SQL
Versions: 10.1.1.0
Reporter: Rick Hillegas
SQL 2003 coyly defines national character types as "implementation defined".
Accordingly, there is considerable variability in how these datatypes behave. Oracle and
MySQL use these datatypes to store unicode strings. This would not distinguish national
from non-national character types in Derby since Derby stores all strings as unicode
sequences.
The national character datatypes (NCHAR, NVARCHAR, NCLOB and their synonymns) used to
exist in Cloudscape but were disabled in Derby. The disabling comment in the grammar says
"need to re-enable according to SQL standard". Does this mean that the types
were removed because they chafed against SQL 2003? If so, what are their defects?
------------------------------------------------------------------
Cloudscape 3.5 provided the following support for national character types:
- NCHAR and NVARCHAR were legal datatypes.
- Ordering operations on these datatypes was determined by the collating
sequence associated with the locale of the database.
- The locale was a DATABASE-wide property which could not be altered.
- Ordering on non-national character datatypes was lexicographic, that is,
character by character.
------------------------------------------------------------------
Oracle 9i provides the following support for national character types:
- NCHAR, NVARCHAR2, and NCLOB datatypes are used to store unicode strings.
- Sort order can be overridden per SESSION or even per QUERY, which means that
these overridden sort orders are not supported by indexes.
------------------------------------------------------------------
DB2 does not appear to support national character types. Nor does its DRDA data
interchange protocol.
------------------------------------------------------------------
MySQL provides the following support for national character types:
- National Char and National Varchar datatypes are used to hold unicode
strings. I cannot find a national CLOB type.
- The character set and sort order can be changed at SERVER-wide, TABLE-wide,
or COLUMN-specific levels.
------------------------------------------------------------------
If we removed the disabling logic in Derby, I believe that the following would
happen:
- We would get NCHAR, NVARCHAR, and NCLOB datatypes.
- These would sort according to the locale that was bound to the database when
it was created.
- We would have to build DRDA transport support for these types.
The difference between national and non-national datatypes would be their sort
order.
I am keenly interested in understanding what defects (other than DRDA support)
should be addressed in the disabled implementation.