[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

Aaron Morton (JIRA) Tue, 15 May 2012 15:11:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276274#comment-13276274
 ]


Aaron Morton commented on CASSANDRA-4245:
-----------------------------------------

Was thinking about the impact of case insensitive comparisons.

Say we have the values: aaron, Aaron, AARON, Äaron, BOB and bob. Using a Case 
Insensitive, Accent Sensitive collation the order should be (am using bytes as 
a secondary ordering, and guessing Ä occurs after the non accented A):

1. AARON, Aaron, aaron
2. Äaron
3. Bob, bob

We need to decide if the collation above results in three or six columns in 
Cassandra. 

Some examples of where the comparison is used:
 * When writing the sorted memtable we are not concerned with equality, only 
relative ordering which is: AARON, Aaron,  aaron, Äaron, Bob, bob 
* When apply a mutation to a CF we are concerned with equality, relative 
ordering is not important. The six columns should be treated as six unique 
values, or as three columns. 
* When resolving a query we are concerned with equality and relative ordering, 
but the equality is different to the examples above. We need to know that the 
three non accented Aaron's are equal, and that Bobs occur later. 

If three columns writing "AARON" then "aaron" then reading "aaron" may result 
in "AARON" being returned. When reducing columns in a slice we need a 
deterministic way to select the column name to use in the response. And / or we 
the response digest needs to be calculated differently.  
 
If six columns comparators need to support a "unique ordering" that is used in 
memtables and sstables, and a "query ordering" used when slicing. In the 
example query ordering results in 3 unique values, unique ordering results in 
6.  

I _think_ 3 columns is what we want. Thoughts ? 

wrt the configuration, collation could be a CF level configuration used by 
comparators that support it. Per column collation would only be used by 
secondary indexing and seems a little overkill. 
                
> Provide a UT8Type (case insensitive) comparator
> -----------------------------------------------
>
>                 Key: CASSANDRA-4245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4245
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Ertio Lew
>            Priority: Minor
>
> It is a common use case to use a bunch of entity names as column names & then 
> use the row as a search index, using search by range. For such use cases & 
> others, it is useful to have a UTF8 comparator that provides case insensitive 
> ordering of columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4245) Provide a UT8Type (case insensitive) comparator

Reply via email to