[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

Matt Stump (JIRA) Mon, 16 Dec 2013 22:41:30 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850175#comment-13850175
 ]


Matt Stump commented on CASSANDRA-2915:
---------------------------------------

Given that the read before write issues still stand for non-numeric fields (as 
of 4.6), is Lucene based secondary indexes still something we want committed in 
the near term? Do we want to wait until incremental update/stacked segments are 
available for all field types?

Additionally, Lucene, even for near realtime search still imposes a delay 
between when a row is added and when it is query-able which would differ from 
existing behavior; is this something that we can live with?

> Lucene based Secondary Indexes
> ------------------------------
>
>                 Key: CASSANDRA-2915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: secondary_index
>
> Secondary indexes (of type KEYS) suffer from a number of limitations in their 
> current form:
>    - Multiple IndexClauses only work when there is a subset of rows under the 
> highest clause
>    - One new column family is created per index this means 10 new CFs for 10 
> secondary indexes
> This ticket will use the Lucene library to implement secondary indexes as one 
> index per CF, and utilize the Lucene query engine to handle multiple index 
> clauses. Also, by using the Lucene we get a highly optimized file format.
> There are a few parallels we can draw between Cassandra and Lucene.
> Lucene indexes segments in memory then flushes them to disk so we can sync 
> our memtable flushes to lucene flushes. Lucene also has optimize() which 
> correlates to our compaction process, so these can be sync'd as well.
> We will also need to correlate column validators to Lucene tokenizers, so the 
> data can be stored properly, the big win in once this is done we can perform 
> complex queries within a column like wildcard searches.
> The downside of this approach is we will need to read before write since 
> documents in Lucene are written as complete documents. For random workloads 
> with lot's of indexed columns this means we need to read the document from 
> the index, update it and write it back.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

Reply via email to