[
https://issues.apache.org/jira/browse/CASSANDRA-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sidharth updated CASSANDRA-4920:
--------------------------------
Description:
Adding a way to sort UTF8 based on below described collation semantics can be
useful.
Use case: Say for example you have wide rows where you cannot use cassandra's
standard indexes(secondary/primary index). Lets say each column had a string
value that was either one of alphanumeric or purely numeric and you wanted an
index by value. MOre specifically you want to slice range over a bunch of
column values and say "get me all the ID's associated with value ABC to XYZ ".
As usual I would index these values in a materialized views
More specifically I create an index CF; And add these values into a
CompositeType column and SliceRange over them for the indexing to work and I
dont really care weather its a alpha or a numeric as long as its ordered by the
following collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like "c10" > "c2".
4) UTF8 type strings assumed everywhere.
How this helps?:
1) You dont end up creating multiple CF for different value types.
2) You dont have to write boiler plate to do complicated type detection and do
this manually in the application.
was:
Adding a way to sort UTF8 based on a standard order(collation) is very useful.
Say for example you have wide rows where you cannot use cassandra's standard
indexes(secondary/primary index). Lets say each column had a string value that
was either one of alphanumeric or purely numeric.
Now lets say I want to index these values in a materialized views so I could
look up things by range of values (range makes sense as a standard ordering
over my alpha numeric and numeric strings i.e. "12" < "10000").
More specifically I add these values into a CompositeType and SliceRange over
them for the index to work and I dont really care weather its a alpha or a
numeric, it should be in the order that follows collation semantics as follows:
1) If the string is a numeric then it should be comparable like a numeric
2) If its a alpha then it should be comparable like a normal string.
3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
should be compared as numbers like "c10" > "c2".
4) UTF8 type strings assumed everywhere.
> Add Collation to abstract type to provide standard sort order for Strings
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-4920
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4920
> Project: Cassandra
> Issue Type: Improvement
> Components: API, Core
> Affects Versions: 1.2.0 beta 1
> Reporter: Sidharth
> Priority: Minor
> Labels: cassandra
>
> Adding a way to sort UTF8 based on below described collation semantics can be
> useful.
> Use case: Say for example you have wide rows where you cannot use cassandra's
> standard indexes(secondary/primary index). Lets say each column had a string
> value that was either one of alphanumeric or purely numeric and you wanted an
> index by value. MOre specifically you want to slice range over a bunch of
> column values and say "get me all the ID's associated with value ABC to XYZ
> ". As usual I would index these values in a materialized views
> More specifically I create an index CF; And add these values into a
> CompositeType column and SliceRange over them for the indexing to work and I
> dont really care weather its a alpha or a numeric as long as its ordered by
> the following collation semantics as follows:
> 1) If the string is a numeric then it should be comparable like a numeric
> 2) If its a alpha then it should be comparable like a normal string.
> 3) If its a alhpa-numeric then a contiguos sequence of numbers in the string
> should be compared as numbers like "c10" > "c2".
> 4) UTF8 type strings assumed everywhere.
> How this helps?:
> 1) You dont end up creating multiple CF for different value types.
> 2) You dont have to write boiler plate to do complicated type detection and
> do this manually in the application.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira