[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

T Jake Luciani (JIRA) Thu, 04 Aug 2011 18:26:54 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079730#comment-13079730
 ]


T Jake Luciani commented on CASSANDRA-2915:
-------------------------------------------

bq. Ok. Typically in distributed search one needs/wants to send the request to 
all of the possible nodes that contain data pertinent to the query. Is this 
possible?

see CASSANDRA-1337 it's going to always need to hit all the nodes in a worst 
case (or if we add support for order by in CQL)


bq. Can we simply define a class that intercepts row updates for a column 
family? Then that class can implement what is needed to analyze the columns / 
row?

The problem is the Type class can be user defined.  So this doesn't get us very 
far, I was thinking we add a new method to AbtractType class that can be set. 
like getLuceneAnalyzer()



> Lucene based Secondary Indexes
> ------------------------------
>
>                 Key: CASSANDRA-2915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2915
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>              Labels: secondary_index
>             Fix For: 1.0
>
>
> Secondary indexes (of type KEYS) suffer from a number of limitations in their 
> current form:
>    - Multiple IndexClauses only work when there is a subset of rows under the 
> highest clause
>    - One new column family is created per index this means 10 new CFs for 10 
> secondary indexes
> This ticket will use the Lucene library to implement secondary indexes as one 
> index per CF, and utilize the Lucene query engine to handle multiple index 
> clauses. Also, by using the Lucene we get a highly optimized file format.
> There are a few parallels we can draw between Cassandra and Lucene.
> Lucene indexes segments in memory then flushes them to disk so we can sync 
> our memtable flushes to lucene flushes. Lucene also has optimize() which 
> correlates to our compaction process, so these can be sync'd as well.
> We will also need to correlate column validators to Lucene tokenizers, so the 
> data can be stored properly, the big win in once this is done we can perform 
> complex queries within a column like wildcard searches.
> The downside of this approach is we will need to read before write since 
> documents in Lucene are written as complete documents. For random workloads 
> with lot's of indexed columns this means we need to read the document from 
> the index, update it and write it back.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2915) Lucene based Secondary Indexes

Reply via email to