Refactor index definitions
--------------------------
Key: CASSANDRA-1601
URL: https://issues.apache.org/jira/browse/CASSANDRA-1601
Project: Cassandra
Issue Type: Improvement
Components: API
Reporter: Stu Hood
Priority: Critical
Fix For: 0.7.0
h3. Overview
There are a few considerations for defining secondary indexes and row
validation that I don't think have been brought up yet. While the interface is
still malleable pre 0.7.0, we should attempt to make changes that allow for
forwards compatibility of index/validator schemas. This is an umbrella ticket
for suggesting/debating the changes: other tickets should be opened for quick
improvements that can be made before 0.7.0.
----
h3. Index output types
The output (queryable) data from an indexing operation is what actually goes in
the index. For a particular row, the output can be either _single-valued_,
_multi-valued_ or _compound_:
* Single-valued
** Implemented in trunk (special case of multi-valued)
* Multi-valued
** Multiple index values _of the same type_ can match a single row
** Row probably contains a list/set (perhaps in a supercolumn)
* Compound
** Multiple base properties concatenated as one index entry
** Different validators/comparators for each component
** (Given the simplicity of performing boolean operations on 1472 indexes,
compound local indexes are unlikely to ever be worthwhile, but compound
distributed indexes will be: see comments on CASSANDRA-1599)
h3. Index input types
The other end of indexing is selection of values from a row to be indexed.
Selection can correspond directly to our current {{db.filter.*}}
implementations, and may be best implemented by specifying the validator/index
using the same Thrift objects you would use for a similar query:
* Name selection
** Implemented in trunk, but should probably just be a special case of list
selection below
** Corresponds to db.filter.NamesQueryFilter of size 1
* List selection
** Should specify a list of columns of which all values must be of the same
type, as defined by the Validator
** Corresponds to db.filter.NamesQueryFilter
* Range (prefix?) selection
** Subsets of a row may be interesting for indexing
** Range corresponds to db.filter.SliceQueryFilter
*** (A Prefix might actually be more useful for indexing, but is better
implemented by indexing an arbitrarily nested row)
** Open question: might the ability to index only the 'top N values' from a row
be useful? If so, then this selector should allow N to be specified like it
would be for a slice
h3. Supercolumns/arbitrary-nesting
Another consideration is that we should be able to support indexing and
validation of supercolumns (and hence, arbitrarily nested rows). Since the
selection of columns to index is essentially the same as the selection of
columns to return for a query, this can probably mirror (and suggest
improvements to) our query API.
h3. UDFs
This is obviously still an open area, but user defined indexing functions are
essentially a transform between the _input_ and _output_ (as defined above),
which would normally have equal structures. Leaving room for UDFs in our index
definitions makes sense, and will likely lead to a much more general and
elegant design.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.