Re: territory-based collations and optimizations for the LIKE operator

Rick Hillegas Mon, 25 Aug 2008 11:03:53 -0700

Thanks, Mamta. I'm having a hard time seeing these issues addressed bythe descriptions of DERBY-1478 and DERBY-2793. Do you think it mightmake sense to log a new issue and link it to these old ones?


Thanks,
-Rick


Mamta Satoor wrote:

Yes, that is correct.

For 1), the disabling of optimizaiton for LIKE for a territory based
database has jira entry DERBY-1478 for it.

For 2), we have jira entry DERBY-2793

Mamta


On 8/25/08, Rick Hillegas <[EMAIL PROTECTED]> wrote:

Thanks, Mamta. This is very helpful. Based on your responses, I think that
the following summarizes how LIKE behaves in databases with territory based
collations:

1) The LIKE optimizations are disabled. For example, given the following
query where A is an indexed string column

 select * from T where T.A like 'foo%'

Derby cannot  use the index on A and falls back on performing a full table
scan.

2) The LIKE operator has Derby-specific semantics. The Derby-specifics
semantics return a subset of the rows which qualify under the ANSI rules.

Does that sound correct to you?

Thanks,
-Rick


Mamta Satoor wrote:

Rick, the behavior we expect for LIKE and = in a territory based
database can be explained by the following comment that I picked up
from DERBY-3166

**********************************************************

The following applies to territory based database
For LIKE, when comparing a pattern against a value string, we do the
comparison of collation elements(s) for one character at a time for
non-metacharacters. This is different than what is done for =
operation. For =, we compare the collation elements for the entire
string on left hand with the collation elements of the entire string
on the right hand side.
For eg say we are working with a territory where character 'z' has
same collation elements as 'xy'. For such a territory consider 2
clauses in WHERE clause
1)'zcb' = 'xycb'
2)'zcb' LIKE 'xy_b'
For case 1), we will return TRUE because the collation elements for
the entire string 'zcb' will match the collation elements of the
entire string 'xycb'.
For case 2) though, we will return FALSE because collation element(s)
for character 'z' does not match the collation element(s) for
character 'x'. So, as can be seen, the LIKE operation is one character
at a time whereas = operation is the entire string at a time.
In addition, the metacharacter _ in pattern for LIKE will consume
*one* character in the string value. So for an eg clause 'xycb' LIKE
'_cb' will return FALSE because metacharacter _ will consume 'x' and
since 'c' does not match 'y', we will return FALSE.

**********************************************************

Is this the information you were looking for?

thanks,
Mamta

On 8/22/08, Rick Hillegas <[EMAIL PROTECTED]> wrote:

According to the Developer's Guide section titled "Character-based

collation

in Derby", the LIKE operator behaves differently than the = operator.

That

is, if you are using a territory-based collation, two strings might =

one

another but not be LIKE one another (and vice-versa).

In the meantime, according to the section in the Tuning Guide titled
"Character string beginning with constant", LIKE expressions which begin
with a constant (followed by a wildcard) can be transformed into

indexable

expressions involving the ">=" and "<" operators.

To my muddled understanding, it seems that one of the following occurs:

1) The LIKE optimizations are disabled if you are using territory-based
collation

or

2) a LIKE expression may return different results depending on your

indexes

and/or the presence of trailing wildcards in your LIKE expression.

(1) seems like a performance issue which needs to be documented. (2)

seems

like a correctness problem. What is the behavior that we expect?

Thanks,
-Rick

Re: territory-based collations and optimizations for the LIKE operator

Reply via email to