Hi Kathey,
My gut feeling is that you are headed off into the tall weeds here. That
said, let me walk with you part way into the swamp. Another feature you
might want would be DERBY-481, computed columns. This would help you get
better performance. So, for instance, you could declare your data like this:
create table foo
(
name varchar(20)
nameKey varchar(60) for bit data generated always as ( locale_order(
'pl', 'PL', name )
);
create index foo_idx on foo( nameKey );
Lacking DERBY-481, you might get away with
create table foo
(
name varchar(20)
nameKey varchar(60) for bit data
);
create index foo_idx on foo( nameKey );
and then use triggers or procedures to populate the nameKey column based
on the value in name.
Then you could get decent performance if you did ORDER BY and GROUP BY
on foo.nameKey. Other operations might look like this:
select * from foo
where nameKey in ( locale_order( 'pl', 'PL', 'dsfaf' ), ... );
select * from foo
where nameKey between locale_order( 'pl', 'PL', 'lkjh' ) and
locale_order( 'pl', 'PL', 'mnbv' );
select * from foo
where nameKey < locale_order( 'pl', 'PL', 'asdfgf'' )
LIKE is going to be a pile of work. I think your LOCALE_MATCHES function
will have to duplicate a lot of the code in Derby. At the end of the
day, you will replace LIKE with LOCALE_MATCHES and so lose the
performance-enhancing query pre-processing which DERBY does for %. Here
the weeds have become too thick for me.
Kathey Marsden wrote:
Rick Hillegas wrote:
3) The locale-sensitive meaning of <, =, and > affected the operation
of all orderings of national strings, including sorts, indexes,
unions, group-by's, like's, between's, and in's.
At one point I was keen on re-enabling the national string types. Now
I am leaning toward implementing the ANSI collation language. I think
this is more powerful. In particular, it lets you support more than
one language-sensitive ordering in the same database.
You and your customer face a hard problem trying to migrate national
strings from Cloudscape 5.1.60 into Derby 10.1.3 or 10.2. I'm at a
loss how to do this in a way that preserves Cloudscape's performance.
Thank you so much Rick for helping me understand this stuff. For now
lets just assume this is just a small dataset and set performance
aside I am interested to know
1) When might Locale specific matching be different in the
context WHERE value LIKE '%< >%' (or whatever language we use)
besides the deprecated Norwegian 'aa' and when might this be useful?
Is it somehow related to bidirectional data like Hebrew and Arabic?
I'm afraid I don't understand the question. I think you are going to
have to duplicate the LIKE processing code, splicing special characters
into subkeys created by LOCALE_ORDER. I don't understand the issues with
Semitic languages but I suspect that Arabic orthography creates some
interesting cases.
2) Is there some easy java code that can be used to accomplish
writing a LOCALE_MATCHES(pattern,value) function?
I'm afraid I can't point you at anything easier than Derby's code.
For the other functionality I have these equivalent functions to
offer as a workaround (see
http://wiki.apache.org/db-derby/LanguageBasedOrdering)
ORDER BY - Use ORDER BY expression with LOCALE_ORDER function
implemented with Collator.getCollationKey()
<, =, > , BETWEEN - Use LOCALE_COMPARE function implemented with
Collator.compare()
IN - Since this is an exact match, would the non-locale specific
matching work ok here?
I'm not sure I understand the question. I don't think you can get around
wrapping local_order around the left and right expressions:
select * from bar where locale_order( 'pl', 'PL', name ) in ( select
locale_order( 'pl', 'PL', name ) from wibble );
GROUP-BY - No solution yet but GROUP BY expression in progress will
allow LOCALE_ORDER to be used.
*LIKE - ???????? * Is there some easy Java regular expression
matching function like String.matches(Collator collator, String
pattern, String value)? I can't find it. The code in
org.apache.derby.iapi.types.Like looks pretty involved, but perhaps
that is what is needed. I just want to confirm before I go down
that path and try to figure it out.f
I agree that this looks pretty involved.
Thanks
Kathey
P.S. I once came very close to getting a cash register meant to
interface to a gas pump working in a Deli with a scale until Mother
Nature stepped in and raised the Russian River to the point that it
swallowed the whole thing up, so I have been known to try too hard for
a workaround. If trying to workaround Locale specific processing
with Derby with FUNCTIONS is a doomed enterprise, I welcome that
perspective as historically I sometimes don't know when to give up.