[
https://issues.apache.org/jira/browse/CASSANDRA-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-1601:
--------------------------------------
Priority: Major (was: Critical)
Fix Version/s: (was: 0.7.0)
0.8
This is a huge amount of feature creep to jam end at the end of 0.7. (Nor do I
think indexing supercolumn data is even desirable.) Pushing to 0.8.
> Refactor index definitions
> --------------------------
>
> Key: CASSANDRA-1601
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
> Project: Cassandra
> Issue Type: Improvement
> Components: API
> Reporter: Stu Hood
> Fix For: 0.8
>
>
> h3. Overview
> There are a few considerations for defining secondary indexes and row
> validation that I don't think have been brought up yet. While the interface
> is still malleable pre 0.7.0, we should attempt to make changes that allow
> for forwards compatibility of index/validator schemas. This is an umbrella
> ticket for suggesting/debating the changes: other tickets should be opened
> for quick improvements that can be made before 0.7.0.
> ----
> h3. Index output types
> The output (queryable) data from an indexing operation is what actually goes
> in the index. For a particular row, the output can be either _single-valued_,
> _multi-valued_ or _compound_:
> * Single-valued
> ** Implemented in trunk (special case of multi-valued)
> * Multi-valued
> ** Multiple index values _of the same type_ can match a single row
> ** Row probably contains a list/set (perhaps in a supercolumn)
> * Compound
> ** Multiple base properties concatenated as one index entry
> ** Different validators/comparators for each component
> ** (Given the simplicity of performing boolean operations on 1472 indexes,
> compound local indexes are unlikely to ever be worthwhile, but compound
> distributed indexes will be: see comments on CASSANDRA-1599)
> h3. Index input types
> The other end of indexing is selection of values from a row to be indexed.
> Selection can correspond directly to our current {{db.filter.*}}
> implementations, and may be best implemented by specifying the
> validator/index using the same Thrift objects you would use for a similar
> query:
> * Name selection
> ** Implemented in trunk, but should probably just be a special case of list
> selection below
> ** Corresponds to db.filter.NamesQueryFilter of size 1
> * List selection
> ** Should specify a list of columns of which all values must be of the same
> type, as defined by the Validator
> ** Corresponds to db.filter.NamesQueryFilter
> * Range (prefix?) selection
> ** Subsets of a row may be interesting for indexing
> ** Range corresponds to db.filter.SliceQueryFilter
> *** (A Prefix might actually be more useful for indexing, but is better
> implemented by indexing an arbitrarily nested row)
> ** Open question: might the ability to index only the 'top N values' from a
> row be useful? If so, then this selector should allow N to be specified like
> it would be for a slice
> h3. Supercolumns/arbitrary-nesting
> Another consideration is that we should be able to support indexing and
> validation of supercolumns (and hence, arbitrarily nested rows). Since the
> selection of columns to index is essentially the same as the selection of
> columns to return for a query, this can probably mirror (and suggest
> improvements to) our query API.
> h3. UDFs
> This is obviously still an open area, but user defined indexing functions are
> essentially a transform between the _input_ and _output_ (as defined above),
> which would normally have equal structures. Leaving room for UDFs in our
> index definitions makes sense, and will likely lead to a much more general
> and elegant design.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.