Jon Haddad created CASSANDRA-11194:
--------------------------------------

             Summary: materialized views - support explode() on collections
                 Key: CASSANDRA-11194
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11194
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Jon Haddad


I'm working on a database design to model a product catalog.  Products can 
belong to categories.  Categories can belong to multiple sub categories (think 
about Amazon's complex taxonomies).

My category table would look like this, giving me individual categories & their 
parents:

{code}
CREATE TABLE category (
    category_id uuid primary key,
    name text,
    parents set<uuid>
);
{code}

To get a list of all the children of a particular category, I need a table that 
looks like the following:

{code}
CREATE TABLE categories_by_parent (
    parent_id uuid,
    category_id uuid,
    name text,
    primary key (parent_id, category_id)
);
{code}

The important thing to note here is that a single category can have multiple 
parents.

I'd like to propose support for collections in materialized views via an 
explode() function that would create 1 row per item in the collection.  For 
instance, I'll insert the following 3 rows (2 parents, 1 child) into the 
category table:

{code}
insert into category (category_id, name, parents) values 
(009fe0e1-5b09-4efc-a92d-c03720324a4f, 'Parent', null);

insert into category (category_id, name, parents) values 
(1f2914de-0adf-4afc-b7ad-ddd8dc876ab1, 'Parent2', null);

insert into category (category_id, name, parents) values 
(1f93bc07-9874-42a5-a7d1-b741dc9c509c, 'Child', 
{009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 });

cqlsh:test> select * from category;

 category_id                          | name    | parents
--------------------------------------+---------+------------------------------------------------------------------------------
 009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent |                               
                                          null
 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 |                               
                                          null
 1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | 
{009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}

(3 rows)

{code}

Given the following CQL to select the child category, utilizing an explode 
function, I would expect to get back 2 rows, 1 for each parent:

{code}
select category_id, name, explode(parents) as parent_id from category where 
category_id = 1f93bc07-9874-42a5-a7d1-b741dc9c509c;

category_id                          | name  | parent_id
--------------------------------------+-------+--------------------------------------
1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
009fe0e1-5b09-4efc-a92d-c03720324a4f
1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
1f2914de-0adf-4afc-b7ad-ddd8dc876ab1

(2 rows)
{code}

This functionality would ideally apply to materialized views, since the ability 
to control partitioning here would allow us to efficiently query our MV for all 
categories belonging to a parent in a complex taxonomy.

{code}
CREATE MATERIALIZED VIEW categories_by_parent as
SELECT explode(parents) as parent_id,
        category_id, name FROM category WHERE parents IS NOT NULL
{code}

The explode() function is available in Spark Dataframes and my proposed 
function has the same behavior: 
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.explode




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to