Daniel John Debrunner wrote:
Rick Hillegas wrote:
Hi Mamta,
Thanks for this extensive write-up. This helps me puzzle through the
issues although I'm afraid I'm still muddled. Some comments follow
inline.
Same here.
Mamta Satoor wrote:
lots of good stuff ...
+1
I think we should avoid violating the SQL standard if we can.
+1
In looking at this I was wondering if it would be useful if it could be
described in terms of SQL Standard constructs how the system columns
come to have a different collation. E.g. is it as though they are
declared at the column level with a <collate clause>, or is it as though
the system schemas are declared with a different character set to the
user schemas?
Would this make a difference in how a string literal or other character
expressions are collated?
For string literals
5.3 SR15) says the collation is the character set collation, and then
5.3 SR14b) says the character set is the the character set of the
SQL-client module that contains the <character string literal>.
So here's it's unclear to me what 'SQL-client module' means in a derby
context.
For a function (or any other declared character type except column
definitions) the collation will come from its data type, which goes to
6.1 SR3b) and 6.1 SR16), which says implementation defined character set.
For a column definition then 11.4 SR10b) specifies the character set as
being the schema's character set.
Thus Derby could have two character sets:
- USER - UCS repertoire with default collation of UCS_BASIC or UNICODE
depending on value of collation JDBC attribute at create database time
- SYSTEM - UCS repertoire with default collation of UCS_BASIC
When a schema is created it is implementation defined as to its
character set (if one is not defined) 11.1 SR5)
So Derby's implementation could be:
user schemas have a character set of USER
system schemas have a character set of SYSTEM
Then ...
- columns would have the required collation (11.4 SR10b))
- functions (& others) would have the required collation (Derby's
implementation could be to pick the schema character set)
which leaves string literals as the issue, what is a 'SQL-client module'?
Dan.
I think that you should carefully consider the implications of using two
character sets. Among other things, it means that two strings with
different character sets are not immediately comparable. And as far as I
know, this applies to literals as well. What this means (I think) is
that if columns in system tables are defined with character set SYSTEM,
columns in user-defined tables are defined with character set USER,
and literals are of type USER, then you cannot immediately compare
literals with the character columns in the system tables.
Another option is to use one character set, but use different collations
for different types of tables. You may define that character columns in
system tables are created using collation UCS_BASIC, while all user
tables are created with a user-defined collation. Because all columns
are defined using the same character set, all columns and literals will
be comparable.
Just remember that when comparing two strings with different defined
collations, you need to consider the collation rules defined by the SQL
standard.
Roy