[ https://issues.apache.org/jira/browse/CASSANDRA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13276274#comment-13276274 ]
Aaron Morton commented on CASSANDRA-4245: ----------------------------------------- Was thinking about the impact of case insensitive comparisons. Say we have the values: aaron, Aaron, AARON, Äaron, BOB and bob. Using a Case Insensitive, Accent Sensitive collation the order should be (am using bytes as a secondary ordering, and guessing Ä occurs after the non accented A): 1. AARON, Aaron, aaron 2. Äaron 3. Bob, bob We need to decide if the collation above results in three or six columns in Cassandra. Some examples of where the comparison is used: * When writing the sorted memtable we are not concerned with equality, only relative ordering which is: AARON, Aaron, aaron, Äaron, Bob, bob * When apply a mutation to a CF we are concerned with equality, relative ordering is not important. The six columns should be treated as six unique values, or as three columns. * When resolving a query we are concerned with equality and relative ordering, but the equality is different to the examples above. We need to know that the three non accented Aaron's are equal, and that Bobs occur later. If three columns writing "AARON" then "aaron" then reading "aaron" may result in "AARON" being returned. When reducing columns in a slice we need a deterministic way to select the column name to use in the response. And / or we the response digest needs to be calculated differently. If six columns comparators need to support a "unique ordering" that is used in memtables and sstables, and a "query ordering" used when slicing. In the example query ordering results in 3 unique values, unique ordering results in 6. I _think_ 3 columns is what we want. Thoughts ? wrt the configuration, collation could be a CF level configuration used by comparators that support it. Per column collation would only be used by secondary indexing and seems a little overkill. > Provide a UT8Type (case insensitive) comparator > ----------------------------------------------- > > Key: CASSANDRA-4245 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4245 > Project: Cassandra > Issue Type: New Feature > Reporter: Ertio Lew > Priority: Minor > > It is a common use case to use a bunch of entity names as column names & then > use the row as a search index, using search by range. For such use cases & > others, it is useful to have a UTF8 comparator that provides case insensitive > ordering of columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira