[
https://issues.apache.org/jira/browse/DERBY-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mamta A. Satoor updated DERBY-2967:
-----------------------------------
Attachment: step1_iteratorbased_Sep1507_stat.txt
step1_iteratorbased_Sep1507_diff.txt
Attaching a new patch (svn diff is attached as
step1_iteratorbased_Sep1507_diff.txt and svn stat -q is attached as
step1_iteratorbased_Sep1507_stat.txt). This patch does not build the collation
elements for the value string in advance, instead it fetches the collation
element from the CollationElementIterator as needed for the value string. In
addition, it does not build CollationElementIterator on entire pattern string.
The metacharacters in pattern are compated using their unicode values. Rest of
the characters in pattern will have CollationElementIterator associated with
them. In other words, for the pattern string, collation elements are used only
for non-metacharacters.
The new logic for LIKE implementation is as follows(This is really the javadoc
for the iapi.types.Like:like(CollationElementIterator valueIterator, String
pattern, String escape, RuleBasedCollator collator)). I do have 2 questions
that I would appreciate help on. The 2 questions are at the end of the nice :)
javadoc below.
/**
* This method will be called for character string types with territory
* based collation. The logic of the method is as follows
* A)If pattern string or value Iterator is null, then this method will
* return null. Because the results of LIKE can't be established in such
* a situation.
* B)Intialize the pointer into pattern string to 0
* C)Start the loop
* a)Check if we have reached the end of value Iterator. If yes
* 1)Check if we have reached the end of pattern string. If yes
* return TRUE.
* 2)Check if we pattern string only has % left. If yes, then
* return TRUE.
* 3)If a1) and a2) not true, then return FALSE.
* c)Start looking at pattern where the pointer is pointing and keep
* going until you find end of pattern or one of the metacharacters
* ie %, * or escape character.
* d)Get a CollationElementIterator for the non-metacharacters found
in
* step c(using the Collator passed to this method. The same
Collator
* was used to construct a CollationElementIterator for value
string).
* and make sure that they match the collation elements found in
* value CollationElementIterator. A mismatch would require us to
* return FALSE from this method.
* e)Do the checks performed by step Ca).
* f)Check what metacharacter is the offset in pattern pointing to
* 1)If it is escape character, then convert the next character in
* pattern to it's collation element(s) and compare those
collation
* elements to elements in valueIterator. If they do not match,
* we need to return FALSE.
* 2)If it is not escape character, then check if it is a _. If yes,
* then skip all the collation elements in valueIterator
* corresponding to the next character in value.
* 3)If it is not escape character or a '_' character, then check if
* it is a '%'. If not, then go back to step C). If yes, then
check
* if we have reached the end of pattern. If end of pattern, then
we
* can simply return from this method with TRUE return value. I
have
* a question Q1(written below). If the code in question in Q1 is
not
* satisified and we have not reached end of pattern, then check
if
* rest of the characters in pattern are all '%'. If yes, then we
* can simply return from this method wil TRUE return value. I
have
* question Q2 at this point
* Q1)I copied the code from the old method implementation which
at
* this point checks if we have reached the end of
valueIterator
* then we should return TRUE value. I think that is incorrect
* because we have reached the end of valueIterator, but there
* might be more characters in the pattern that we have not
* matched yet.
* Q2)What would be the best way to implement the logic to handle
* valueIterator for a % found in the pattern.
* g)Go back to step C).
> Single character does not match high value unicode character with collation
> TERRITORY_BASED
> -------------------------------------------------------------------------------------------
>
> Key: DERBY-2967
> URL: https://issues.apache.org/jira/browse/DERBY-2967
> Project: Derby
> Issue Type: Bug
> Components: SQL
> Affects Versions: 10.4.0.0
> Reporter: Kathey Marsden
> Assignee: Mamta A. Satoor
> Attachments: step1_iteratorbased_Sep1507_diff.txt,
> step1_iteratorbased_Sep1507_stat.txt, temp_diff.txt, temp_stat.txt,
> TestFrench.java, TestNorway.java
>
>
> With TERRITORY_BASED collation '_' does not match the character \uFA2D. It
> is the same for english or norwegian. FOR collation UCS_BASIC it matches
> fine. Could you tell me if this is a bug?
> Here is a program to reproduce.
> import java.sql.*;
> public class HighCharacter {
> public static void main(String args[]) throws Exception
> {
> System.out.println("\n Territory no_NO");
> Class.forName("org.apache.derby.jdbc.EmbeddedDriver");
> Connection conn =
> DriverManager.getConnection("jdbc:derby:nordb;create=true;territory=no_NO;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Territory en_US");
> conn =
> DriverManager.getConnection("jdbc:derby:endb;create=true;territory=en_US;collation=TERRITORY_BASED");
> testLikeWithHighestValidCharacter(conn);
> conn.close();
> System.out.println("\n Collation USC_BASIC");
> conn = DriverManager.getConnection("jdbc:derby:basicdb;create=true");
> testLikeWithHighestValidCharacter(conn);
> }
> public static void testLikeWithHighestValidCharacter(Connection conn) throws
> SQLException {
> Statement stmt = conn.createStatement();
> try {
> stmt.executeUpdate("drop table t1");
> }catch (SQLException se)
> {// drop failure ok.
> }
> stmt.executeUpdate("create table t1(c11 int)");
> stmt.executeUpdate("insert into t1 values 1");
>
> // \uFA2D - the highest valid character according to
> // Character.isDefined() of JDK 1.4;
> PreparedStatement ps =
> conn.prepareStatement("select 1 from t1 where '\uFA2D' like ?");
> String[] match = { "%", "_", "\uFA2D" };
> for (int i = 0; i < match.length; i++) {
> System.out.println("select 1 from t1 where '\\uFA2D' like " + match[i]);
> ps.setString(1, match[i]);
> ResultSet rs = ps.executeQuery();
> if( rs.next() && rs.getString(1).equals("1"))
> System.out.println("PASS");
> else System.out.println("FAIL: no match");
> rs.close();
> }
> }
> }
> Mamta made some comments on this issue in the following thread:
> http://www.nabble.com/Single-character-does-not-match-high-value-unicode-character-with-collation-TERRITORY_BASED.-Is-this-a-bug-tf4118767.html
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.