Re: Collation feature discussion

Rick Hillegas Mon, 26 Mar 2007 07:54:06 -0800

Hi Mamta,

Thanks for this extensive write-up. This helps me puzzle through theissues although I'm afraid I'm still muddled. Some comments follow inline.


Mamta Satoor wrote:

lots of good stuff ...
SQL spec also talks in various sections about default collationassociated with character repertoire, character set, SQL schema, SQLsession. These defaults are used to determine the collationrequirement for character string types in ambiguous places. (In a SQLimplementation of SQL spec's collation clause support, if the rulesfor the collation determination is not what the application wants,then the user can use the COLLATE clause to override the SQL specbehavior.) For string literals, which has been the focus of recentcollation discussion on the Derby list, SQL spec specifies(Section 5.3<literal> Syntax Rule 15) that it's collation will be the collation ofthe character set (which is the default collation of the characterset. Every character set defined in the SQL implementation is requiredto have a default collation).In the absence of the COLLATE clause support in Derby 10.3, I do notthink we can follow the SQL spec for string literals' collation type.If we decide to make user defined collation(through the JDBC urlattribute COLLATION) as Derby's default collation for the characterset, then that would mean that a 10.3 db with COLLATION attribute asTERRITORY_BASED will always have string literals with TERRITORY_BASEDcollation. This will break our metadata queries which does comparisonof SYS character column against string literals. The SYS charactercolumn will have UCS_BASIC collation and string literals will haveTERRITORY_BASED collation and during the comparison, Derby will end upthrowing exception because character strings with different collationcan't be compared. If Derby 10.3 had support for COLLATE clause, thenwe could implement SQL spec behavior for string literal and letmetadata queries use the COLLATE clause and users could use theCOLLATE clause in their queries against system tables when usingstring literals.

I wonder if we could just implement the COLLATE clause for stringliterals. Would that be a lot of work? The metadata queries could beadjusted to use the COLLATE clause. I think we should avoid violatingthe SQL standard if we can.

So, in the absence of the COLLATE clause in Derby 10.3, what I amproposing is string literals have a collation type of UNKNOWN. Whenthey get used in a collation operation, these UNKNOWN collation typeswill get their collation from the other operands involved in theoperation (requirement here would be that all the operands whosecollation type is not UNKNOWN, will have the same collation typeassociated with them. This requirement comes from Section 9.13Collation determination, Syntax Rule 3e). If all of the operands incollation operation have collation type as UNKNOWN, then the collationchosen would be the one specified through the COLLATION attribute onthe JDBC url. This last line makes Derby's collation behavior notmatch SQL spec Section 9.13 Collation determination, Rule 2 which says"At least one operand shall have a declared type collation." Butagain, I think in the absence of SQL's COLLATE clause, we have tobreak some rules to make Derby more flexible. If we go with myproposal for string literal, then a comparison like 'aa' < 'ba' willhave the 2 operands with UNKNOWN collation, and Derby's collationalgorithim at collation time will choose the value specified forCOLLATION attribute in the JDBC url. For function that returns astring datatype, we will have the collation type for that stringdatatype as UNKNOWN (same as string literals).Other than this, I think Derby's collation type of UNKNOWN has a clearmapping with "none" value of collation derivation in SQL spec.

I'm a little lost here. Could you explain what is meant by the UNKNOWNcollation? I don't see this collation described in either the SQLstandard or in the spec attached to DERBY-1478.

more good stuff ...

7)For CURRENT_USER, SESSION_USER, SYSTEM_USER, SQL spec Section 6.4Syntax Rule 4 says that their collation type is the collationof character set SQL_IDENTIFIER. SQL spec Section 4.2.6 Collationstalks about SQL_IDENTIFIER's collation being implementation defined.Based on this, we can decide the collation for these USER functionsto be UNKNOWN. Again, the argument here is same as used for stringliterals.

It seems to me that SQL_IDENTIFIERs have collation UCS_BASIC. This ishow they behave when it comes to preventing two schema objects fromhaving the same name. And this is what forces a collation of UCS_BASICon the string columns in the system tables. I'm not sure I understandbullet (7). I hope it is not saying that SQL_IDENTIFIERs have UNKNOWNcollation. I think we will get into a lot of trouble if we try to forceSQL_IDENTIFIERs to have some collation other than UCS_BASIC.


Thanks,
-Rick

Re: Collation feature discussion

Reply via email to