[
https://issues.apache.org/jira/browse/CASSANDRA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stu Hood updated CASSANDRA-749:
-------------------------------
Attachment: views-discussion-2.txt
> Which is true, but it ignores the fact that, in most cases, you will have to
> "query the full cluster" to get the actual matching rows
This was the point of the views being "semi-materialized". If your view
contains all of the data you were interested in from the base row, and it
matches a configured recency, then you don't need to query the base. Please see
my comment re: "cribs" in the latest attached conversation.
> local indexes is better in the common case, since it actually saves a round
> trip from querying a the index to querying the rows
I disagree. I would expect that the view would contain a large number of rows
(typically depending on 1 row each), so querying for one row in the view would
usually query one or two rows in the base: not necessarily thousands. Also, the
partitioned index has much better best case performance: for the local
secondary indexes, you _always_ need to query every unique range/endpoint in
the cluster during the first phase, and then merge sort the results from all
nodes before you can return a response for even a single row. Federating
without partitioning will not scale.
Being able to implement these skinny rows (rather than the million column rows
lazyboy attempts) depends on being able to support non-unique row keys, but
that is basically just a compound key of the view-key and the base-key
appended, as described on CASSANDRA-767.
> locally means you don't have to worry about sharding a very large index since
> it happens automatically
This is why we have load balancing.
> since the former we can do w/o opening the whole user defined functions box
> which is a pretty big deal
There is no need to allow for arbitrary functions initially if we take the same
approach we've taken for comparators: to start, a new view would be defined by
extending an abstract class. We could easily have a built in "SecondaryIndex"
view class that uses a matching column name/value as the row key in the view.
----
Without a way to use these secondary indexes in queries, they are completely
pointless. Is the intention that the indexes would be used to speed up
predicates/filters in get_range_slices, or are you proposing that the secondary
index/view looks and acts like a normal column family, with all of the row
content, but with the secondary key as the row key? The former seems pointless,
and the latter seems like it should be implemented using the partitioned
secondary index approach.
> Secondary indices for column families
> -------------------------------------
>
> Key: CASSANDRA-749
> URL: https://issues.apache.org/jira/browse/CASSANDRA-749
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Gary Dusbabek
> Assignee: Gary Dusbabek
> Priority: Minor
> Fix For: 0.8
>
> Attachments: 0001-simple-secondary-indices.patch,
> views-discussion-2.txt, views-discussion.txt
>
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.