[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Mamta A. Satoor (JIRA) Mon, 22 Oct 2007 11:02:04 -0700

    [ 
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536768
 ]


Mamta A. Satoor commented on DERBY-2967:
----------------------------------------

Knut, yes, I did mean the SQL = operation. Also, thanks for your testing.

>From various discussions in the past on Derby list about aa and å in 
>Norwegian, I made the assumption that the JVM's collation table for Norwegian 
>must have same collation element for aa and å. But that is not the case as 
>shwon by your test case inside ij. I also wrote a very simple test case 
>outside of Derby(copied below) which shows the collation elements for aa and å 
>are different in Norwegian and that is why the SQL operation 'aa'='å' is 
>returning false.

RuleBasedCollator myCollator = (RuleBasedCollator)Collator.getInstance(new 
Locale("da","DK"));

System.out.println("what happens if iterator is on aa string");
CollationElementIterator aIterator = 
myCollator.getCollationElementIterator("aa");
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());

System.out.println("what happens if iterator is on å string");
aIterator = myCollator.getCollationElementIterator("å");
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());
System.out.println("next is " + aIterator.next());
System.out.println("offset is " + aIterator.getOffset());

Output of the code above
what happens if iterator is on aa string
next is 7405570
offset is 2
next is -1
offset is 2
what happens if iterator is on σ string
next is 7405568
offset is 1
next is -1
offset is 1

So, my example to show different behavior of SQL LIKE and SQL = is not correct. 

I am wondering if anyone knows of any characters in a language where the 
characters are different but they have the same collation elements in that 
language. The test case is going to require different *number* of characters in 
each side of =. Having different *number* of characters(but same collation 
element(s)) is crucial to show the difference between = and LIKE.

> Single character does not match high value unicode character with collation 
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY2967_Oct11_07_diff.txt, 
> DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt, 
> DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, 
> patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt, 
> patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, 
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt, 
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It 
> is the same for english or norwegian. FOR collation UCS_BASIC it matches 
> fine.  Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = 
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = 
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws 
> SQLException {
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Reply via email to