Re: [jira] Commented: (DERBY-533) Re-enable national character datatypes

Rick Hillegas Wed, 24 Aug 2005 16:02:42 -0700

Hi Roy,

Thanks for your helpful analysis. We should probably pay closerattention to character sets and collations, particularly since MySQL hasinvested so much effort here.


Cheers,
-Rick

Roy Lyseng wrote:

Hi Rick,
I have only studied the SQL 1992 standard concerning character sets,hope my understanding is still valid (if it ever was).
Both the CHAR and the NCHAR data types are actually the same data typeCHAR (or CHARACTER), but made up of characters from differentcharacter sets. Each database has in effect two default charactersets, the one used for CHAR and the one used for NCHAR. But you mayalso specify an explicit character set for a column as in NAMECHARACTER(100) CHARACTER SET UTF8. The character set used for CHAR canalso be overridden per schema.
Thus, when you create a database, you should be able to specify thatthe default character set for CHAR columns be ASCII, and the characterset used for NCHAR be UTF8.
Note also that according to the SQL standard, values of type CHAR butwith different character sets are not generally comparable.
Each character set will also have a default collation. In a databasewith full SQL support for character sets and collations, you might usethis to say that both CHAR and NCHAR store UTF16 characters, but thatCHAR has a binary collation and NCHAR has a French collation.
SQL will also allow you to override a collation specification e.g. onan ORDER BY statement, and though not specified by the SQL standard,you might be able to create an index using a national ordering sequence.
Cheers,
Roy

Rick Hillegas (JIRA) wrote:
[http://issues.apache.org/jira/browse/DERBY-533?page=comments#action_12319919]
Rick Hillegas commented on DERBY-533:
-------------------------------------
1) There are some interesting issues here. Let's say that were-enable these datatypes in 10.2. What happens when a clientapplication selects from an NCHAR column under the followingcombinations? What should the ResultSetMetaData say the column is? Isthe following reasonable?
| NETWORK CLIENT | CLIENT PLATFORM | RESULT TYPE |
|-----------------------------|-----------------------------|----------------------|| Derby 10.2 | jdk1.4 |NCHAR ||-----------------------------|-----------------------------|----------------------|| Derby 10.2 | jdk1.6 |NCHAR ||-----------------------------|-----------------------------|----------------------|| Derby 10.1 | jdk1.4 |CHAR ||-----------------------------|-----------------------------|----------------------|| Derby 10.1 | jdk1.6 |CHAR ||-----------------------------|-----------------------------|----------------------|| db2jcc | jdk1.4 |CHAR ||-----------------------------|-----------------------------|----------------------|| db2jcc | jdk1.6 |CHAR ||-----------------------------|-----------------------------|----------------------|
Since all of our string datatypes are represented as unicode, I thinkit is ok, as necessary, to implicitly cast CHAR to NCHAR going fromclient to server.
I also think it is reasonable to raise an exception if someone runs a10.1 server against a 10.2 database.
2) I don't see where the SQL standard addresses coercion betweennational strings and other types. Part 2 section 4.2.1 says thatNATIONAL CHARACTER is "implementation defined". Part 2 section 6.12lists legal and forbidden CASTS but says nothing about nationalstring types. As always, I welcome being educated about what elsemight be relevant in the spec.
Oracle supports the following coercions but not the inverse coercionsand Oracle documentation does not address localization issues:
   Datetime/Interval -> NCHAR/NVARCHAR2
   Number -> NCHAR/NVARCHAR2

MySQL does not advertise any ability to cast to/from national strings.

DB2 and Postgres do not support national strings.
In summary, I do not see much guidance here. Derby's previousbehavior seems reasonable to me: applying localization when coercingbetween national strings and other types.
Re-enable national character datatypes
--------------------------------------

        Key: DERBY-533
        URL: http://issues.apache.org/jira/browse/DERBY-533
    Project: Derby
       Type: New Feature
 Components: SQL
   Versions: 10.1.1.0
   Reporter: Rick Hillegas
SQL 2003 coyly defines national character types as "implementationdefined". Accordingly, there is considerable variability in howthese datatypes behave. Oracle and MySQL use these datatypes tostore unicode strings. This would not distinguish national fromnon-national character types in Derby since Derby stores all stringsas unicode sequences.The national character datatypes (NCHAR, NVARCHAR, NCLOB and theirsynonymns) used to exist in Cloudscape but were disabled in Derby.The disabling comment in the grammar says "need to re-enableaccording to SQL standard". Does this mean that the types wereremoved because they chafed against SQL 2003? If so, what are theirdefects?
------------------------------------------------------------------
Cloudscape 3.5 provided the following support for national charactertypes:
- NCHAR and NVARCHAR were legal datatypes.
- Ordering operations on these datatypes was determined by thecollating sequence associated with the locale of the database.
- The locale was a DATABASE-wide property which could not be altered.
- Ordering on non-national character datatypes was lexicographic,that is, character by character.
------------------------------------------------------------------
Oracle 9i provides the following support for national character types:
- NCHAR, NVARCHAR2, and NCLOB datatypes are used to store unicodestrings.- Sort order can be overridden per SESSION or even per QUERY, whichmeans that these overridden sort orders are not supported by indexes.
------------------------------------------------------------------
DB2 does not appear to support national character types. Nor doesits DRDA data interchange protocol.
------------------------------------------------------------------
MySQL provides the following support for national character types:
- National Char and National Varchar datatypes are used to holdunicode strings. I cannot find a national CLOB type.- The character set and sort order can be changed at SERVER-wide,TABLE-wide, or COLUMN-specific levels.
------------------------------------------------------------------
If we removed the disabling logic in Derby, I believe that thefollowing would happen:
- We would get NCHAR, NVARCHAR, and NCLOB datatypes.
- These would sort according to the locale that was bound to thedatabase when it was created.
- We would have to build DRDA transport support for these types.
The difference between national and non-national datatypes would betheir sort order.I am keenly interested in understanding what defects (other thanDRDA support) should be addressed in the disabled implementation.

Re: [jira] Commented: (DERBY-533) Re-enable national character datatypes

Reply via email to