[
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535987
]
Knut Anders Hatlen commented on DERBY-2967:
-------------------------------------------
One more question: In the discussion above, I see that the special case where
two strings of different length map to the same sequence of collation elements,
has been discussed. What about two characters, c1 and c2, which have different
Unicode codepoints, but map to the same sequence of collation elements. Should
both c1 = c2 and c1 LIKE c2 be true? That's how it's implemented, but I'm not
sure whether c1 LIKE c2 should be true or false. I haven't checked what the
standard says (and I'm not sure I want to... ;) ), but it feels a bit strange
that two different characters should be LIKE because they have the same
collation elements, when 'aa' is not LIKE 'å' because they are not the same
single character. I understand how we can split the character sequence 'aa'
into the single characters 'a' and 'a'. I don't understand how we can take a
single collation element 'aa' and split it into two separate collation elements
'a' and 'a'. I'm sure the standard says it's correctly implemented, and I guess
its wording will make it quite clear, perhaps even logical, that it has to be
that way. I just wanted to double check that we had verified it...
> Single character does not match high value unicode character with collation
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
> Key: DERBY-2967
> URL: https://issues.apache.org/jira/browse/DERBY-2967
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.4.0.0
> Reporter: Kathey Marsden
> Assignee: Mamta A. Satoor
> Attachments: DERBY2967_Oct11_07_diff.txt,
> DERBY2967_Oct11_07_stat.txt, DERBY2967_offset_based_diff_Oct02_07.txt,
> DERBY2967_offset_based_stat_Oct02_07.txt, fullcoll.out,
> patch2_setOffset_fullcoll.out, patch2_with_setOffset_diff_Sep2007.txt,
> patch2_with_setOffset_stat_Sep2007.txt, step1_iteratorbased_Sep1507_diff.txt,
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt,
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match the character \uFA2D. It
> is the same for english or norwegian. FOR collation UCS_BASIC it matches
> fine. Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
> public static void main(String args[]) throws Exception
> {
> System.out.println("\n Territory no_NO");
> Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
> Connection conn =
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Territory en_US");
> conn =
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Collation USC_BASIC");
> conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
> testLikeWithHighestValidCharacter(conn);
> }
> public static void testLikeWithHighestValidCharacter(Connection conn) throws
> SQLException {
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("drop table t1");
> }catch (SQLException se)
> {// drop failure ok.
> }
> stmt.executeUpdate("create table t1(c11 int)");
> stmt.executeUpdate("insert into t1 values 1");
>
> // \uFA2D - the highest valid character according to
> // Character.isDefined() of JDK 1.4;
> PreparedStatement ps =
> conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
> String[] match = { "%", "_", "\uFA2D" };
> for (int i = 0; i < match.length; i++) {
> System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
> ps.setString(1, match[i]);
> ResultSet rs = ps.executeQuery();
> if( rs.next() && rs.getString(1).equals("1"))
> System.out.println("PASS");
> else System.out.println("FAIL: no match");
> rs.close();
> }
> }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.