Following the SQL standard the collation derivation of a string literal
and a column reference are both "implicit". This is what the current
collation feature is implementing.
This leads to some interesting cases once Derby supports column specific
collations.
Take:
create table T(NAME CHAR(10) COLLATE X);
Now the boolean expression in a query
NAME = 'Fred'
will fail if the collation for 'Fred' is not X. That could exist with
the current implementation (literals default to the collation of the
current schema) or the original proposal (literals default to the
database's collation).
It gets a little stranger if a comparison expression is used within the
create table:
create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR', 'PLANE',
'BOAT');
Now the create table will fail.
Both of these could be made to work with a COLLATE clause, but it
becomes especially strange in the CHECK clause, as the COLLATE clause
must now appear multiple times (I think).
create table T(TYPE CHAR(10) COLLATE X CHECK TYPE IN ('CAR' COLLATE X,
'PLANE' COLLATE X, 'BOAT' COLLATE X);
In both of these cases it seems (to me) that what the language really
intends is that collation to be used is that of the column reference.
Note that even without multiple user collations, Derby will have this
problem for queries that system columns and user schemas, e.g.
sys.systables.TABLENAME = 'MYTABLE' will fail if compiled in a user schema.
Interestingly both MySQL and Microsoft have solved this problem by
adding additional levels to the derivation concept. SQL Standard
supports three levels: NONE, IMPLICIT and EXPLICIT.
Microsoft has added one: Coercible-default between implicit and none.
http://msdn2.microsoft.com/en-us/library/ms179886.aspx
and string literals map to coercible-default, thus in the above examples
the column's collation will be used.
MySQL instead has a concept of "coercibility" with six levels, 0-5. This
is claimed to be SQL standard but seems to be from an earlier version of
the standard. I found another reference to it in some SQL book.
http://doc.51windows.net/mysql/?url=/mysql/ch10s05.html
(section 10.5.4)
Again string literals are at a different level to column references
leading to column collation being used.
Does anyone know of any correction in the SQL standard regarding this
issue? I haven't been able to find one.
I'm just thinking about the future and trying to ensure that anyone who
implements multiple user collations for Derby won't have to break
backwards compatibility to ensure rational behaviour.
Thanks,
Dan.