Daniel John Debrunner wrote:
Following the SQL standard the collation derivation of a string literal and a column reference are both "implicit". This is what the current collation feature is implementing.

This leads to some interesting cases once Derby supports column specific collations.

Take:

create table T(NAME CHAR(10) COLLATE X);

Now the boolean expression in a query

   NAME = 'Fred'

will fail if the collation for 'Fred' is not X. That could exist with the current implementation (literals default to the collation of the current schema) or the original proposal (literals default to the database's collation).

It gets a little stranger if a comparison expression is used within the create table:

create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR', 'PLANE', 'BOAT');

Now the create table will fail.

Both of these could be made to work with a COLLATE clause, but it becomes especially strange in the CHECK clause, as the COLLATE clause must now appear multiple times (I think).

create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR' COLLATE X, 'PLANE' COLLATE X, 'BOAT' COLLATE X);

In both of these cases it seems (to me) that what the language really intends is that collation to be used is that of the column reference.

Note that even without multiple user collations, Derby will have this problem for queries that system columns and user schemas, e.g. sys.systables.TABLENAME = 'MYTABLE' will fail if compiled in a user schema.

Interestingly both MySQL and Microsoft have solved this problem by adding additional levels to the derivation concept. SQL Standard supports three levels: NONE, IMPLICIT and EXPLICIT.

Microsoft has added one: Coercible-default between implicit and none.

http://msdn2.microsoft.com/en-us/library/ms179886.aspx

and string literals map to coercible-default, thus in the above examples the column's collation will be used.

MySQL instead has a concept of "coercibility" with six levels, 0-5. This is claimed to be SQL standard but seems to be from an earlier version of the standard. I found another reference to it in some SQL book.

http://doc.51windows.net/mysql/?url=/mysql/ch10s05.html
(section 10.5.4)

Again string literals are at a different level to column references leading to column collation being used.
I don't know any standard reference. Both the mysql and oracle rules seem more user friendly with regard at least to comparison between constants and implicit collation references (columns being the case
I usually think of affecting the most usage).


Does anyone know of any correction in the SQL standard regarding this issue? I haven't been able to find one.

I'm just thinking about the future and trying to ensure that anyone who implements multiple user collations for Derby won't have to break backwards compatibility to ensure rational behaviour.

Thanks,
Dan.


Reply via email to