[jira] Updated: (CASSANDRA-749) Secondary indices for column families

Stu Hood (JIRA) Fri, 12 Mar 2010 22:59:54 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stu Hood updated CASSANDRA-749:
-------------------------------

    Attachment: views-discussion-2.txt

> Which is true, but it ignores the fact that, in most cases, you will have to 
> "query the full cluster" to get the actual matching rows
This was the point of the views being "semi-materialized". If your view 
contains all of the data you were interested in from the base row, and it 
matches a configured recency, then you don't need to query the base. Please see 
my comment re: "cribs" in the latest attached conversation.

> local indexes is better in the common case, since it actually saves a round 
> trip from querying a the index to querying the rows
I disagree. I would expect that the view would contain a large number of rows 
(typically depending on 1 row each), so querying for one row in the view would 
usually query one or two rows in the base: not necessarily thousands. Also, the 
partitioned index has much better best case performance: for the local 
secondary indexes, you _always_ need to query every unique range/endpoint in 
the cluster during the first phase, and then merge sort the results from all 
nodes before you can return a response for even a single row. Federating 
without partitioning will not scale.

Being able to implement these skinny rows (rather than the million column rows 
lazyboy attempts) depends on being able to support non-unique row keys, but 
that is basically just a compound key of the view-key and the base-key 
appended, as described on CASSANDRA-767.

> locally means you don't have to worry about sharding a very large index since 
> it happens automatically
This is why we have load balancing.

> since the former we can do w/o opening the whole user defined functions box 
> which is a pretty big deal
There is no need to allow for arbitrary functions initially if we take the same 
approach we've taken for comparators: to start, a new view would be defined by 
extending an abstract class. We could easily have a built in "SecondaryIndex" 
view class that uses a matching column name/value as the row key in the view.

----

Without a way to use these secondary indexes in queries, they are completely 
pointless. Is the intention that the indexes would be used to speed up 
predicates/filters in get_range_slices, or are you proposing that the secondary 
index/view looks and acts like a normal column family, with all of the row 
content, but with the secondary key as the row key? The former seems pointless, 
and the latter seems like it should be implemented using the partitioned 
secondary index approach.

> Secondary indices for column families
> -------------------------------------
>
>                 Key: CASSANDRA-749
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-749
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Gary Dusbabek
>            Assignee: Gary Dusbabek
>            Priority: Minor
>             Fix For: 0.8
>
>         Attachments: 0001-simple-secondary-indices.patch, 
> views-discussion-2.txt, views-discussion.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (CASSANDRA-749) Secondary indices for column families

Reply via email to