Lucene based Secondary Indexes
------------------------------
Key: CASSANDRA-2915
URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
Project: Cassandra
Issue Type: New Feature
Components: Core
Reporter: T Jake Luciani
Fix For: 1.0
Secondary indexes (Type KEYS) currently suffer from a number of limitations in
their current form:
- Multiple IndexClauses only work when there is a subset of rows under the
highest clause
- One new column family is created per index this means 10 new CFs for 10
secondary indexes
This ticket will use the Lucene library to implement secondary indexes as one
index per CF, and utilize the Lucene query engine to handle multiple index
clauses. Also, by using the Lucene we get a highly optimized file format.
There are a few parallels we can draw between Cassandra and Lucene.
Lucene indexes segments in memory then flushes them to disk so we can sync our
memtable flushes to lucene flushes. Lucene also has optimize() which correlates
to our compaction process, so these can be sync'd as well.
We will also need to correlate column validators to Lucene tokenizers, so the
data can be stored properly, the big win in once this is done we can perform
complex queries within a column like wildcard searches.
The downside of this approach is we will need to read before write since
documents in Lucene are written as complete documents. For random workloads
with lot's of indexed columns this means we need to read the document from the
index, update it and write it back.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira