>>>>>>>>>>>> Mamta Satoor wrote (2007-10-09 09:40:53): > [...] > Unicode has a concept of Contraction where a user might perceive more than > one character as a single character in a given language. One eg of this > would be 'AA' in Norwegian locale. Although, this Contraction is made of > 2 Unicode characters, 'A' and 'A', a Norwegian user perceives them as > a single character.
No. Noone in Norway would perceive 'aa' as a single character. > In addition, in Norwegian, the collation elements for 'AA' are > identical to collation elements for 'Å'. So, the question is what > should SQL operation 'AA' like 'Å' return? Also, is that behavior > same as SQL operation 'AA' = 'Å' ? 'AA' and 'Å' will in Norwegian have the same collation weight, but the way the SQL standard is phrased, that does not apply to LIKE which does an character by character match of the strings. I admit that 'aa'= 'å' and 'aa' LIKE 'å' giving different results may seem a bit odd but I am not able to interpret Sections 8.2 and 8.5 differently (On the other hand, that's not the only place the SQL spec is contraintuitive for the average programmer). On the other hand, searching and matching are handled in Unicode T10 sections 1.5 and 8 (SELECT is explicitely mentioned in 1.5). It seems that Unicode is in agreement with how I interpret the semantics for comparison operators but not in agreement with how I interpret the sematics for the LIKE predicate. Does anyone know what MySQL, Oracle, SQL Server etc does with this? If the other major databases are in agreement, we could follow their interpretation if we find it reasonable. -- Bernt Marius Johnsen, Database Technology Group, Staff Engineer, Derby/Java DB Sun Microsystems, Trondheim, Norway
pgpgEeuANUxvh.pgp
Description: PGP signature
