[ 
https://issues.apache.org/jira/browse/DERBY-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470603
 ] 

Mamta A. Satoor commented on DERBY-1478:
----------------------------------------

Rick, I looked at SQL specification(Part 2) regarding SQL identifiers. For 
background, some general information on SQL identifiers from SQL spec if as 
follows
<Start of contents from SQL spec>
1)As per SQL specification Part 2, Section 4.2.4, the character repertoire for 
sql identifiers, SQL_IDENTIFIER, consists of  <SQL language character> Latin 
characters and digits,and all the other characters that the SQL-implementation 
supports for use in <regular identifier>. After this, everything else related 
to SQL_IDENTIFER character repertoire is defined as implementation-defined. To 
be specific, 
2)Section 4.2.5, Character encoding form, Pg 22 says SQL_IDENTIFIER is an 
implementation-defined character encoding form. It is applicable to the 
SQL_IDENTIFIER character repertoire.
3)Section 4.2.6, Collation, Pg 23, says SQL_IDENTIFIER is an 
implementation-defined collation. It is applicable to the SQL_IDENTIFIER 
character repertoire.
4)And lastly, in Section 4.2.7, Character Sets, SQL_IDENTIFIER is a character 
set whose repertoire is SQL_IDENTIFIER and whose character encoding form is 
SQL_IDENTIFIER. The name of its default collation is SQL_IDENTIFIER.
5)Section 4.2.3.1, Pg 19, talks about case folding. <fold> is a pair of 
funtions for converting all the lower case and title case characters in a given 
string to upper case or all the upper case and title case characters to lower 
case. A lower case character is a character in the Unicode General Category 
class "Ll" and upper case character is a character in the Unicode General 
Category class "Lu".
<End of contents from SQL spec>

>From the information above, we see that SQL specification leaves CEF and 
>collation for SQL identifiers as implementation-defined but I donot see it 
>saying specifically that case folding as implementation-defined. Even the 
>section 4.2.3.1, Pg 19, second paragraph, talks about converting case in a 
>generic manner in the context of UNICODE and not English locale.

So, I am not sure why Derby/Cloudscape chose to use English locale to do case 
conversion of SQL identifiers. Derby's StringUtil class, where the SQL case 
conversion code lies, has following comment
        // The functions below are used for uppercasing SQL in a consistent 
manner.
        // Cloudscape will uppercase Turkish to the English locale to avoid i
        // uppercasing to an uppercase dotted i. In future versions, all 
        // casing will be done in English.   The result will be that we will get
        // only the 1:1 mappings  in 
        // http://www.unicode.org/Public/3.0-Update1/UnicodeData-3.0.1.txt
        // and avoid the 1:n mappings in 
        //http://www.unicode.org/Public/3.0-Update1/SpecialCasing-3.txt
        // 
        // Any SQL casing should use these functions

Dan, you mentioned in one of your comments to this Jira entry that "Currently 
the uppercasing of SQL statements and identifiers is fixed as English to avoid 
unexpected issue with other languages". Can you please explaing what you mean 
by unexpected issues? Is that the same reason for recommending same behavior 
for system tables?

> Add built in language based ordering and like processing to Derby
> -----------------------------------------------------------------
>
>                 Key: DERBY-1478
>                 URL: https://issues.apache.org/jira/browse/DERBY-1478
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.2.1
>            Reporter: Kathey Marsden
>         Assigned To: Mamta A. Satoor
>         Attachments: DERBY-1478_FunctionalSpecV1.html
>
>
> It would be good for Derby to have built in Language based ordering based on 
> locale specific Collator.
> Language based ordering is an important feature for international deployment. 
>  DERBY-533 offers one implementation option for this but according to the 
> discussion in that issue National Character Types carry a fair amount of 
> baggage with them especially in the form of concerns about conversion   to 
> and from datetime and number types. Rick  mentioned SQL language for 
> collations as an option for language based ordering. There may be other 
> options too, but I thought it worthwhile to add an issue for the high level 
> functional concern, so the best choice can be made for implementation without 
> assuming that National Character Types is the only solution.
> For possible 10.1 workaround and examples see:
> http://wiki.apache.org/db-derby/LanguageBasedOrdering

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to