[ 
https://issues.apache.org/jira/browse/CASSANDRA-11194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251727#comment-15251727
 ] 

Keith Wansbrough commented on CASSANDRA-11194:
----------------------------------------------

We would also find it very useful to be able to create a materialized view on a 
set. The {{explode}} syntax looks good for this:
{code}
CREATE TABLE customers (
  id text PRIMARY KEY,
  data text,
  phones frozen<set<text>>
);
  
CREATE MATERIALIZED VIEW customers_by_phone AS
  SELECT explode(phones), id
  FROM customers
  WHERE phones IS NOT NULL;
{code}

We have a database of customers with an ID as primary key. Each customer has 
zero or more phone numbers. We would like to be able to create a materialized 
view so we can look up by phone number.

Our current schema uses a frozen set for this, but either frozen or unfrozen 
would be fine.

> materialized views - support explode() on collections
> -----------------------------------------------------
>
>                 Key: CASSANDRA-11194
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11194
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jon Haddad
>
> I'm working on a database design to model a product catalog.  Products can 
> belong to categories.  Categories can belong to multiple sub categories 
> (think about Amazon's complex taxonomies).
> My category table would look like this, giving me individual categories & 
> their parents:
> {code}
> CREATE TABLE category (
>     category_id uuid primary key,
>     name text,
>     parents set<uuid>
> );
> {code}
> To get a list of all the children of a particular category, I need a table 
> that looks like the following:
> {code}
> CREATE TABLE categories_by_parent (
>     parent_id uuid,
>     category_id uuid,
>     name text,
>     primary key (parent_id, category_id)
> );
> {code}
> The important thing to note here is that a single category can have multiple 
> parents.
> I'd like to propose support for collections in materialized views via an 
> explode() function that would create 1 row per item in the collection.  For 
> instance, I'll insert the following 3 rows (2 parents, 1 child) into the 
> category table:
> {code}
> insert into category (category_id, name, parents) values 
> (009fe0e1-5b09-4efc-a92d-c03720324a4f, 'Parent', null);
> insert into category (category_id, name, parents) values 
> (1f2914de-0adf-4afc-b7ad-ddd8dc876ab1, 'Parent2', null);
> insert into category (category_id, name, parents) values 
> (1f93bc07-9874-42a5-a7d1-b741dc9c509c, 'Child', 
> {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 
> });
> cqlsh:test> select * from category;
>  category_id                          | name    | parents
> --------------------------------------+---------+------------------------------------------------------------------------------
>  009fe0e1-5b09-4efc-a92d-c03720324a4f |  Parent |                             
>                                             null
>  1f2914de-0adf-4afc-b7ad-ddd8dc876ab1 | Parent2 |                             
>                                             null
>  1f93bc07-9874-42a5-a7d1-b741dc9c509c |   Child | 
> {009fe0e1-5b09-4efc-a92d-c03720324a4f, 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1}
> (3 rows)
> {code}
> Given the following CQL to select the child category, utilizing an explode 
> function, I would expect to get back 2 rows, 1 for each parent:
> {code}
> select category_id, name, explode(parents) as parent_id from category where 
> category_id = 1f93bc07-9874-42a5-a7d1-b741dc9c509c;
> category_id                          | name  | parent_id
> --------------------------------------+-------+--------------------------------------
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
> 009fe0e1-5b09-4efc-a92d-c03720324a4f
> 1f93bc07-9874-42a5-a7d1-b741dc9c509c | Child | 
> 1f2914de-0adf-4afc-b7ad-ddd8dc876ab1
> (2 rows)
> {code}
> This functionality would ideally apply to materialized views, since the 
> ability to control partitioning here would allow us to efficiently query our 
> MV for all categories belonging to a parent in a complex taxonomy.
> {code}
> CREATE MATERIALIZED VIEW categories_by_parent as
> SELECT explode(parents) as parent_id,
>         category_id, name FROM category WHERE parents IS NOT NULL
> {code}
> The explode() function is available in Spark Dataframes and my proposed 
> function has the same behavior: 
> http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.explode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to