I should mention that the actual contents of the table t1 does not matter here because my examples use character string constants in the WHERE clause and not the columns from the table t1. The only role table t1 has is if the 2 rows from it get selected or not depending on what the WHERE clause returns.
Mamta On 10/9/07, Mamta Satoor <[EMAIL PROTECTED]> wrote: > > Hi, > > I am currently working on DERBY-2967. This issue involves determining the > behavior of LIKE operation for character string types in a territory based > database. I plan on implementing the findings of DERBY-2967 but wanted to > provide a high-level behavior description here to see if anyone has any > feedback on the proposed behavior. > > Unicode has a concept of Contraction where a user might perceive more than > one character as a single character in a given language. One eg of this > would be 'AA' in Norwegian locale. Although, this Contraction is made of > 2 Unicode characters, 'A' and 'A', a Norwegian user perceives them as > a single character. In addition, in Norwegian, the collation elements for > 'AA' are identical to collation elements for 'Å'. So, the question is what > should SQL operation 'AA' like 'Å' return? Also, is that behavior same > as SQL operation 'AA' = 'Å' ? > > Say there is a table t1 with 2 rows in it. > > Now taking an eg based on table t1 for LIKE > select * from t1 where 'AA' like 'Å' > The query above will return no rows because as per SQL standard, LIKE does > not treat Contractions as one single character. Instead, we work with one > Unicode character at a time in LIKE. So, for the given eg, the first Unicdoe > character in 'AA' which is 'A' does not match with 'Å' and hence no rows > would be returned. > > Now taking an eg based on table t1 for = > select * from t1 where 'AA' = 'Å' > The query above would return 2 rows because as per SQL standard, in = > operation, we take the entire character string and use the locale to > generate the collation elements for that string. And then we compare the > collation elements for the 2 sides of the = operation. Since the 'AA' and > 'Å' both result into identical collation elements, the where clause ends up > being TRUE and hence we return 2 rows. > > Expanding further on LIKE operation, the metacharacter _ will match > exactly one Unicode character(and not Contractions). For eg select * from t1 > where 'AA' like '_' will return 0 rows because _ consumes only the first A > in 'AA' and we still have one A left which is not consumed by the pattern. > In order for the WHERE clause to be true, the SQL should look like select * > from t1 where 'AA' like '__'. Same rule applies for metacharacter %. It will > treat 'AA' in LIKE as 2 distinct characters. For eg, select * from t1 where > 'AAB' like '%AB' will return 2 rows because % consumed only the first A (and > not the Contraction AA). Another eg select * from t1 where 'BAAC' like > 'BA%C' will return 2 rows because % consumed the 2nd A. > > I hope this mail helps understand the behavior of LIKE in a territory > based database. I plan to work on implementing this behavior in next few > days but please let me know if anyone has any questions that I can help > clarify. > > thanks, > Mamta >
