[jira] [Commented] (CASSANDRA-4511) Secondary index support for CQL3 collections

Zachary Marcantel (JIRA) Mon, 18 Nov 2013 15:05:40 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825918#comment-13825918
 ]


Zachary Marcantel commented on CASSANDRA-4511:
----------------------------------------------

I can think of (and need this for) a few use cases. They do revolve around more 
of a filtering aspect, however.

It can be said that sometimes models/data/rows belong to a "theoretical list" 
that contain infinite possibilities. 

Naturally, you do not want to store all possibilities, but may want to filter 
on those which are true while ignoring all other possiblities.

Examples of these lists could be:
- Things of interest to a user
  -- LinkedIn calls these skills, Facebook has 'liked pages', etc
- Movies Watched
  -- Netflix surely doesn't (want to) have a USER x MOVIES sized table
  -- Nor do they want (user x movies) number of columnfamilies
- Places Visited
  -- set yes/no for EVERY location on Earth?

Benchmarks may prove me wrong, but theoretically the performance hit would be 
minimal if the data is truly partitioned well, collections are kept small, and 
secondary indexing used only as a filter and not data storage.

Dynamic columns may make some of my examples easier, but bring their own 
headaches (post-filling dynamically created columns, massively wide tables, 
largely unused data == disk bloat).

I'll give a couple examples and use [~jbellis] syntax, as well as a potential 
map-based indexing.

- Users contained in group(s):
-- Note: this could be done with columns, but if we assume groups can contain 
infinitely many possibilities (like Facebook groups)
- {code:sql}
SELECT * FROM main.users WHERE 'players' IN groups AND 'admins' NOT IN groups;
{code}

- Filter on toggle-based UI elements within user profiles:
{code:sql}
SELECT * FROM main.users WHERE notify['email'] = true;
{code}


Given a possibly endless list, map rows (or data pieces) onto items within that 
list:
For instance, user profiles often have 'interests' that could contain 
one/multiple of thousands if not millions of possibilities.
Currently, one would have to detail the entirety of the list that has been seen 
in one of two ways:
{code:sql}
CREATE TABLE main.interests (
    interest_name TEXT PRIMARY KEY,
    users LIST <TEXT>
);
{code}

OR

{code:sql}
CREATE TABLE main.users (
    id UUID PRIMARY KEY,
    ... other user fields ...
    interests LIST <TEXT>
);
{code}

Both of which would require post-result processing (map-reduce or similar) to 
find just the users containing a certain key/value.

Rather, with indexing:
{code:sql}
CREATE TABLE main.users (
    id UUID PRIMARY KEY,
    name TEXT,
    age INT,
    interests LIST <TEXT>
);
{code}

where 'interests' is a relatively small (~10-25 elements) list that can be 
filtered by:

{code:sql}
SELECT * FROM main.users WHERE 'baseball' IN interests AND 'soccer' IN 
interests;
{code}

> Secondary index support for CQL3 collections 
> ---------------------------------------------
>
>                 Key: CASSANDRA-4511
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4511
>             Project: Cassandra
>          Issue Type: Improvement
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.1
>
>         Attachments: 4511.txt
>
>
> We should allow to 2ndary index on collections. A typical use case would be 
> to add a 'tag set<String>' to say a user profile and to query users based on 
> what tag they have.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-4511) Secondary index support for CQL3 collections

Reply via email to