[ 
https://issues.apache.org/jira/browse/DERBY-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12469885
 ] 

Mamta A. Satoor commented on DERBY-1478:
----------------------------------------

I have updated the functional spec based on some feedback from Dan. The changes 
to the spec in brief are,
a)The collation ordering will apply only to CHAR and VARCHAR types 
2)The collation for system table will continue to be code point based no matter 
what collation user chooses for CHAR and VARCHAR types. 
3)I will change the JDBC attribut to accept String values. This way, this 
feature can be expanded in future to support more variety of collations rather 
than just the code point based ordering and territory based ordering.

The functional spec now looks as follows
I would like to propose a way for supporting locale sensitive data in Derby. 
Currently, upto Derby 10.2 release, the sorting for CHAR and VARCHAR data types 
is codepoint based (UNICODE). For someone looking for any locale specific 
collation, they can possibly write a couple functions as suggested by 
http://wiki.apache.org/db-derby/LanguageBasedOrdering, but that solution is not 
complete and not efficient (since functional indexes can't be defined in 
Derby). 

My proposal for Derby 10.3 is that a user would be able to specify an optional 
JDBC url attribute, called collation, at the database create time and that 
attribute can be set to one of the following 2 values
1) UCS_BASIC (This means codepoint based collation. This will also be the 
default collation used by Derby if no collation attribute is specified on the 
JDBC url at the database create time. This collation is what Derby 10.2 and 
prior have supported) or 
2)TERRITORY_BASED_COLLATION (the collation will be based on language region 
specified by the exisiting Derby attribute called territory (territory=ll_CC) 
http://db.apache.org/derby/docs/10.2/ref/rrefattrib56769.html If the territory 
attribute is not specified at the database create time, Derby 10.2 uses 
java.util.Locale.getDefault to determine the territory for the newly created 
database. Derby 10.3 will continue to use the same mecahnism to determine the 
territory.)

The collation attribute will apply only for CHAR and VARCHAR datatypes defined 
in a user-defined table. System tables will continue to use codepoint based 
collation for it's CHAR and VARCHAR columns.

This collation ordering will impact operations that require returning the order 
of data on CHAR and VARCHAR columns. That includes
1)Comparison using comparison operators (<, >, =, IN, BETWEEN)
2)Statements that involve sorting (ORDER BY, GROUP BY, DISTINCT, MAX, and MIN)
3)Statements that use the LIKE keyword

Derby already has lot of code for locale based ordering for disabled NATIONAL 
CHAR and NATIONAL VARCHAR datatypes. I hope to leverage highly on that code and 
see how it can be used for this project. Also, I am keeping a goal for myself 
to implement this in such a way that databases with codepoint based collation 
will not get penalized by the code for locale based collation.

I am not planning to implement any new collation support on any existing 
database, ie JDBC attribute collation will not be supported at the upgrade 
database time or on a pre-existing database. Those databases will continue to 
use codepoint based collation. I am proposing to implement the collation 
support only for new databases. 

Other than finding a means of storing the collation attribute from the JDBC url 
somewhere, I don't anticipate any other disk changes as part of this project.

> Add built in language based ordering and like processing to Derby
> -----------------------------------------------------------------
>
>                 Key: DERBY-1478
>                 URL: https://issues.apache.org/jira/browse/DERBY-1478
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.1.2.1
>            Reporter: Kathey Marsden
>         Assigned To: Mamta A. Satoor
>
> It would be good for Derby to have built in Language based ordering based on 
> locale specific Collator.
> Language based ordering is an important feature for international deployment. 
>  DERBY-533 offers one implementation option for this but according to the 
> discussion in that issue National Character Types carry a fair amount of 
> baggage with them especially in the form of concerns about conversion   to 
> and from datetime and number types. Rick  mentioned SQL language for 
> collations as an option for language based ordering. There may be other 
> options too, but I thought it worthwhile to add an issue for the high level 
> functional concern, so the best choice can be made for implementation without 
> assuming that National Character Types is the only solution.
> For possible 10.1 workaround and examples see:
> http://wiki.apache.org/db-derby/LanguageBasedOrdering

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to