[
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
T Jake Luciani reassigned CASSANDRA-2915:
-----------------------------------------
Assignee: Jason Rutherglen
> Lucene based Secondary Indexes
> ------------------------------
>
> Key: CASSANDRA-2915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: T Jake Luciani
> Assignee: Jason Rutherglen
> Labels: secondary_index
> Fix For: 1.0
>
>
> Secondary indexes (of type KEYS) suffer from a number of limitations in their
> current form:
> - Multiple IndexClauses only work when there is a subset of rows under the
> highest clause
> - One new column family is created per index this means 10 new CFs for 10
> secondary indexes
> This ticket will use the Lucene library to implement secondary indexes as one
> index per CF, and utilize the Lucene query engine to handle multiple index
> clauses. Also, by using the Lucene we get a highly optimized file format.
> There are a few parallels we can draw between Cassandra and Lucene.
> Lucene indexes segments in memory then flushes them to disk so we can sync
> our memtable flushes to lucene flushes. Lucene also has optimize() which
> correlates to our compaction process, so these can be sync'd as well.
> We will also need to correlate column validators to Lucene tokenizers, so the
> data can be stored properly, the big win in once this is done we can perform
> complex queries within a column like wildcard searches.
> The downside of this approach is we will need to read before write since
> documents in Lucene are written as complete documents. For random workloads
> with lot's of indexed columns this means we need to read the document from
> the index, update it and write it back.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira