Hi Brett,
Some comments inline...
On 4/12/11 6:21 PM, Brett Wooldridge wrote:
I have perhaps a misunderstanding/mis-expectation of a database
created with collation TERRITORY_BASED:PRIMARY.
From the manual:
"Territory based with collation strength PRIMARY. Specify this value
to make Derby behave similarly to many other databases, for which
PRIMARY is commonly the default. PRIMARY typically means that only
differences in base letters are considered significant, whereas
differences in accents or case are not considered significant."
So, I was _hoping_ that indeed Derby would behave similarly to many
other databases, but ...
Assume I create a database as follows:
jdbc:derby:test;create=true;collation=TERRITORY_BASED:PRIMARY
And create a table as follows:
CREATE TABLE foo (
id INTEGER GENERATED BY DEFAULT AS IDENTITY,
column1 VARCHAR(255)
)
And insert a test row...
INSERT INTO foo (column1) VALUES ('test')
A "standard" query works as expected ...
SELECT * FROM foo WHERE column1 LIKE 'TE%'
... does indeed perform a case in-sensitive LIKE as desired, and
returns this row:
ID COLUMN1
-------------------------------
1 test
So far, so good. Now this query...
SELECT * FROM foo WHERE id = '1' -- Note the quoted '1' rather than
a raw 1
... returns this error:
Error: Comparisons between 'INTEGER' and 'CHAR (UCS_BASIC)' are not
supported. Types must be comparable. String types must also have
matching collation. If collation does not match, a possible solution
is to cast operands to force them to the default collation (e.g.
SELECT tablename FROM sys.systables WHERE CAST(tablename AS
VARCHAR(128)) = 'T1')
SQLState: 42818
ErrorCode: -1
This query also fails in vanilla databases which have no collation
specified. This behavior is described in the Derby Reference Guide in
the section titled "Data type assignments and comparison, sorting, and
ordering".
Unfortunately, this is VERY unlike other databases. This is a trivial
query which I would expect to work. If Derby knows the collation the
the database, why are casts not automatically performed? I actually
encountered this error first with this query to list all tables:
SELECT T2.* FROM SYS.SYSTABLES systabs, TABLE
(SYSCS_DIAG.SPACE_TABLE(systabs.tablename)) AS T2
WHERE systabs.tabletype = 'T'
ORDER BY isindex, conglomeratename
Which returns a similar error whereby the only solution is to
CAST(systabs.tabletype AS CHAR).
My company is, or rather was, in the process of switching our product
from Derby with default (UCS_BASIC) collation, to
TERRITORY_BASED:PRIMARY in order to solve performance issues related
to the inability to perform case in-sensitive queries (because using a
function like LCASE() will not use an index).
Another solution, but one which takes up more space, is to build an
index on a generated column:
CREATE TABLE foo (
id INTEGER GENERATED BY DEFAULT AS IDENTITY,
column1 VARCHAR(255),
column2 varchar(255) generated always as ( lower( column1 ) )
);
create index lfooidx on foo( column2, id );
select * from foo where column2 = 'abc';
Hope this helps,
-Rick
Unfortunately, given the size of the code-base (several hundred
thousand lines), the fact that a myriad of these CAST related errors
could be hiding in nooks and crannies is forcing us to reconsider.
Even if we _could_ find and correct all queries were something like
an INTEGER is being compared to a string, having to put CAST
everywhere we might interact with system tables seems gross.
Is there any solution to this? Any connection mode, startup
parameter, etc. that we're missing? Is there any reason Derby
shouldn't perform these CAST automatically? No other database I've
used (Oracle, SQL Server, MySQL, PostgreSQL, ...) seem to have this
problem.
Thanks,
Brett Wooldridge