Re: Language based matching

Rick Hillegas Tue, 11 Jul 2006 08:44:13 -0700

Hi Kathey,

Here is my understanding of how the disabled national string types worked:

1) A national string type used the collation ordering appropriate to thelocale of the database. That collation ordering, in turn, was specifiedby the jdk and could not be overriden.

2) The collation ordering determined the meaning of <, =, and > fornational strings. For a given locale, the rules can be quite tricky. Ifyou're not familiar with a locale, you are likely to be surprised by thevisibly different strings which nevertheless turn out to be = to oneanother.

3) The locale-sensitive meaning of <, =, and > affected the operation ofall orderings of national strings, including sorts, indexes, unions,group-by's, like's, between's, and in's.

At one point I was keen on re-enabling the national string types. Now Iam leaning toward implementing the ANSI collation language. I think thisis more powerful. In particular, it lets you support more than onelanguage-sensitive ordering in the same database.

You and your customer face a hard problem trying to migrate nationalstrings from Cloudscape 5.1.60 into Derby 10.1.3 or 10.2. I'm at a losshow to do this in a way that preserves Cloudscape's performance.


Regards,
-Rick

Kathey Marsden wrote:

Bernt M. Johnsen wrote:
"aa" as one letter was removed from the Norwegian language in 1938 ("å"
had been optional since 1917). It is only used in names today and it is
true what Anders says about the phonebook (also about the foreign names
where "aa" is treated like two letters). I don't think it would be wise
to not let "a.*" match "Aasen" (wich in modern writing would be Åsen).
Thank you so much Knut Anders and Bernt for the clarification on"aa". I guess now I need a new example and need to understand howLocale specific LIKE processing is functionally different thanregular like behavior and when it is required.The user I have been working with is actually migrating fromCloudscape 5.1.60 National Character types and the goal was to get aworkaround to achieve the same behavior in Derby. The example camefrom the doc:
http://publibfi.boulder.ibm.com/epubs/html/cloud51/doc/html/coredocs/sqlj105.htm#1178996
Clearly the Derby code still has the code path for the National Typespecial processing.In org.apache.derby.iapi.types.SQLChar We have a separate code pathfor National Character types that passes the Collator.How is this functionally different than LIKE processing for regularcharacter types? Can anyone think of another example where thisspecial processing might be needed?
Thanks

Kathey

Below is a SQLChar code snippet for reference.
public BooleanDataValue like(DataValueDescriptor pattern)
                               throws StandardException
   {
       Boolean likeResult;

       if (! isNationalString())
       {
           // note that we call getLength() because the length
           // of the char array may be different than the
           // length we should be using (i.e. getLength()).
           // see getCharArray() for more info
           char[] evalCharArray = getCharArray();
           char[] patternCharArray = ((SQLChar)pattern).getCharArray();
           likeResult = Like.like(evalCharArray,
                                  getLength(),
                                   patternCharArray,
                                  pattern.getLength());
       }
       else
       {
           SQLChar patternSQLChar = (SQLChar) pattern;
           likeResult = Like.like(getIntArray(),
                                  getIntLength(),
                                   patternSQLChar.getIntArray(),
                                  patternSQLChar.getIntLength(),
                                  getLocaleFinder().getCollator());
       }

Re: Language based matching

Reply via email to