[jira] Updated: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Mamta A. Satoor (JIRA) Thu, 11 Oct 2007 11:21:50 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mamta A. Satoor updated DERBY-2967:
-----------------------------------

    Attachment: DERBY2967_Oct11_07_stat.txt
                DERBY2967_Oct11_07_diff.txt

I am attaching a new patch (DERBY2967_Oct11_07_diff.txt) which is much simpler 
than earlier patches because the implementation of LIKE for UCS_BASIC and 
territory based character string types do not differ much(based on SQL standard 
as explained in earlier few comments to this Jira entry). I have been able to 
change the existing code for LIKE (in Like.java) for UCS_BASIC character 
strings to support territory based character strings. The existing method in 
Like.java now gets a new parameter and it is RuleBasedCollator. For UCS_BASIC 
strings, this will be passed as NULL. We check if the RuleBasedCollator is NULL 
and if so then we do simple one character equality check for non-metacharacters 
in pattern and correspnding characters in value string. But if 
RuleBasedCollator is not NULL, then we use it to get collation element(s) for 
one character at a time for non-metacharacters in patterns and corresponding 
characters in value string and do the collation element(s) comparison to 
establish equality. 

In addition to the above mentioned change in Like.java, I have changed the 
callers of the method in Like.java to pass correct value for the 
RuleBasedCollator. 

Additionally, I have added a test to CollationTest.java for the code changes. 
Existing like tests in CollationTest2.java were very useful in the testing of 
my changes. And lastly, I changed few of the existing tests to use different 
character string values so that when we run the full collation tests, we do not 
see some of the test failures which are genuine because of the nature of their 
data. 

Would appreciate if someone has time to review the patch for me. I will plan on 
committing this early next week if there are no issues.

> Single character does not match high value unicode character with collation 
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2967
>                 URL: https://issues.apache.org/jira/browse/DERBY-2967
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.0.0
>            Reporter: Kathey Marsden
>            Assignee: Mamta A. Satoor
>         Attachments: DERBY2967_Oct11_07_diff.txt, 
> DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt, 
> DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out, 
> patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt, 
> patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt, 
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt, 
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match  the character \uFA2D.  It 
> is the same for english or norwegian. FOR collation UCS_BASIC it matches 
> fine.  Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
>    public static void main(String args[]) throws Exception
>    {
>    System.out.println("\n Territory no_NO");
>    Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
>    Connection conn = 
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Territory en_US");
>    conn = 
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
>    testLikeWithHighestValidCharacter(conn);
>    conn.close();
>    System.out.println("\n Collation USC_BASIC");
>    conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
>    testLikeWithHighestValidCharacter(conn);
>    }
> public static  void testLikeWithHighestValidCharacter(Connection conn) throws 
> SQLException {
>    Statement stmt = conn.createStatement();
>    try {
>    stmt.executeUpdate("drop table t1");
>    }catch (SQLException se)
>    {// drop failure ok.
>    }
>    stmt.executeUpdate("create table t1(c11 int)");
>    stmt.executeUpdate("insert into t1 values 1");
>  
>    // \uFA2D - the highest valid character according to
>    // Character.isDefined() of JDK 1.4;
>    PreparedStatement ps =
>    conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
>      String[] match = { "%", "_", "\uFA2D" };
>    for (int i = 0; i < match.length; i++) {
>    System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
>    ps.setString(1, match[i]);
>    ResultSet rs = ps.executeQuery();
>    if( rs.next() && rs.getString(1).equals("1"))
>        System.out.println("PASS");
>    else          System.out.println("FAIL: no match");
>    rs.close();
>    }
>   }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-2967) Single character does not match high value unicode character with collation TERRITORY_BASED

Reply via email to