[
https://issues.apache.org/jira/browse/CASSANDRA-7062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Constance Eustace updated CASSANDRA-7062:
-----------------------------------------
Description:
CASSANDRA-6561 implemented static columns for a given partition key.
What this is proposing for a compound cluster key is a static column that is
static at intermediate parts of a compound cluster key. This example shows a
table modelling a moderately complex EAV pattern :
{code}
CREATE TABLE t (
entityID text,
propertyName text,
valueIndex text,
entityName text static (entityID),
propertyType text static (entityID, propertyName),
propertyRelations List<text> static (entityID, propertyName),
data text,
PRIMARY KEY (entityID, (propertyName,valueIndex))
)
{code}
So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all
cluster rows have the same value
- the propertyType and propertyRelations columns are static with respect to the
remaining parts of the cluster key (that is, across all valueIndex values for a
given propertyName), so an update to those values for an entityID and a
propertyName will be shared/constant by all the value rows...
Is this a relatively simple extension of the same mechanism in -6561, or is
this a "whoa, you have no idea what you are proposing"?
Sample data:
Mary and Jane aren't married...
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
VALUES ('0001','MARY MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
VALUES ('0002','JANE JOHNSON','kids','NOVALUE','');
{code}
{code}
SELECT * FROM t:
0001 MARY MATALIN married SingleValue 0 false
0001 MARY MATALIN kids NOVALUE null
0002 JANE JOHNSON married SingleValue 0 false
0002 JANE JOHNSON kids NOVALUE null
{code}
Then mary and jane get married (so the entityName column that is static on the
partition key is updated just like CASSANDRA-6561 )
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids NOVALUE null
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids NOVALUE null
{code}
Then mary and jane have a kid, so we add another value to the kids attribute:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0002','kids','SingleValue','0','JENNY');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids SingleValue null
0001 MARY SMITH kids SingleValue 0 JIM-BOB
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids SingleValue null
0002 JANE JONES kids SingleValue 0 JENNY
{code}
Then Mary has ANOTHER kid, which demonstrates the partially static column
relative to the cluster key, as ALL value rows for the property 'kids' get
updated to the new value:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0001','kids','MultiValue','1','HARRY');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids MultiValue null
0001 MARY SMITH kids MultiValue 0 JIM-BOB
0001 MARY SMITH kids MultiValue 1 HARRY
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids SingleValue null
0002 JANE JONES kids SingleValue 0 JENNY
{code}
... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid
hack bug in there with the null/empty row for the kids attribute, but please
bear with me on that
Generally speaking, this will aid in flattening / denormalization of relational
constructs into cassandra-friendly schemas. In the above example we are
flattening a relational schema of three tables: entity, property, and value
tables into a single sparse flattened denormalized compound table.
was:
CASSANDRA-6561 implemented static columns for a given partition key.
What this is proposing for a compound cluster key is a static column that is
static at intermediate parts of a compound cluster key. This example shows a
table modelling a moderately complex EAV pattern :
{code}
CREATE TABLE t (
entityID text,
propertyName text,
valueIndex text,
entityName text static (entityID),
propertyType text static (entityID, propertyName),
propertyRelations List<text> static (entityID, propertyName),
data text,
PRIMARY KEY (entityID, (propertyName,valueIndex))
)
{code}
So in this example has the following static columns:
- the entityName column behaves exactly as CASSANDRA-6561 details, so all
cluster rows have the same value
- the propertyType and propertyRelations columns are static with respect to the
remaining parts of the cluster key (that is, across all valueIndex values for a
given propertyName), so an update to those values for an entityID and a
propertyName will be shared/constant by all the value rows...
Is this a relatively simple extension of the same mechanism in -6561, or is
this a "whoa, you have no idea what you are proposing"?
Sample data:
Mary and Jane aren't married...
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
VALUES ('0001','MARY MATALIN','kids','NOVALUE','');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
VALUES ('0002','JANE JOHNSON','kids','NOVALUE','');
{code}
{code}
SELECT * FROM t:
0001 MARY MATALIN married SingleValue 0 false
0001 MARY MATALIN kids NOVALUE null
0002 JANE JOHNSON married SingleValue 0 false
0002 JANE JOHNSON kids NOVALUE null
{code}
Then mary and jane get married (so the entityName column that is static on the
partition key is updated just like CASSANDRA-6561 )
{code}
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE');
INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids NOVALUE null
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids NOVALUE null
{code}
Then mary and jane have a kid, so we add another value to the kids attribute:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0001','kids','SingleValue','0','JIM-BOB');
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0002','kids','SingleValue','0','JENNY');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids SingleValue null
0001 MARY SMITH kids SingleValue 0 JIM-BOB
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids SingleValue null
0002 JANE JONES kids SingleValue 0 JENNY
{code}
Then Mary has ANOTHER kid, which demonstrates the partially static column
relative to the cluster key, as ALL value rows for the property 'kids' get
updated to the new value:
{code}
INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
('0001','kids','MultiValue','1','HARRY');
{code}
{code}
SELECT * FROM t:
0001 MARY SMITH married SingleValue 0 TRUE
0001 MARY SMITH kids MultiValue null
0001 MARY SMITH kids MultiValue 0 JIM-BOB
0001 MARY SMITH kids MultiValue 1 HARRY
0002 JANE JONES married SingleValue 0 TRUE
0002 JANE JONES kids SingleValue null
0002 JANE JONES kids SingleValue 0 JENNY
{code}
... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid
hack bug in there with the null/empty row for the kids attribute, but please
bear with me on that
Generally speaking, this will aid in flattening / denormalization of relational
constructs into cassandra-friendly schemas. In the above example we are
flattening a relational schema of three tables: entity, property, and value
tables into a single sparse flattened denormalized compound table.
> Extension of static columns for compound cluster keys
> -----------------------------------------------------
>
> Key: CASSANDRA-7062
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7062
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Constance Eustace
> Priority: Minor
>
> CASSANDRA-6561 implemented static columns for a given partition key.
> What this is proposing for a compound cluster key is a static column that is
> static at intermediate parts of a compound cluster key. This example shows a
> table modelling a moderately complex EAV pattern :
> {code}
> CREATE TABLE t (
> entityID text,
> propertyName text,
> valueIndex text,
> entityName text static (entityID),
> propertyType text static (entityID, propertyName),
> propertyRelations List<text> static (entityID, propertyName),
> data text,
> PRIMARY KEY (entityID, (propertyName,valueIndex))
> )
> {code}
> So in this example has the following static columns:
> - the entityName column behaves exactly as CASSANDRA-6561 details, so all
> cluster rows have the same value
> - the propertyType and propertyRelations columns are static with respect to
> the remaining parts of the cluster key (that is, across all valueIndex values
> for a given propertyName), so an update to those values for an entityID and a
> propertyName will be shared/constant by all the value rows...
> Is this a relatively simple extension of the same mechanism in -6561, or is
> this a "whoa, you have no idea what you are proposing"?
> Sample data:
> Mary and Jane aren't married...
> {code}
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
> data) VALUES ('0001','MARY MATALIN','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
> data) VALUES ('0002','JANE JOHNSON','married','SingleValue','0','false');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
> VALUES ('0001','MARY MATALIN','kids','NOVALUE','');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex)
> VALUES ('0002','JANE JOHNSON','kids','NOVALUE','');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY MATALIN married SingleValue 0 false
> 0001 MARY MATALIN kids NOVALUE null
> 0002 JANE JOHNSON married SingleValue 0 false
> 0002 JANE JOHNSON kids NOVALUE null
> {code}
> Then mary and jane get married (so the entityName column that is static on
> the partition key is updated just like CASSANDRA-6561 )
> {code}
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
> data) VALUES ('0001','MARY SMITH','married','SingleValue','0','TRUE');
> INSERT INTO t (entityID, entityName, propertyName, propertyType, valueIndex,
> data) VALUES ('0002','JANE JONES','married','SingleValue','0','TRUE');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH married SingleValue 0 TRUE
> 0001 MARY SMITH kids NOVALUE null
> 0002 JANE JONES married SingleValue 0 TRUE
> 0002 JANE JONES kids NOVALUE null
> {code}
> Then mary and jane have a kid, so we add another value to the kids attribute:
> {code}
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
> ('0001','kids','SingleValue','0','JIM-BOB');
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
> ('0002','kids','SingleValue','0','JENNY');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH married SingleValue 0 TRUE
> 0001 MARY SMITH kids SingleValue null
> 0001 MARY SMITH kids SingleValue 0 JIM-BOB
> 0002 JANE JONES married SingleValue 0 TRUE
> 0002 JANE JONES kids SingleValue null
> 0002 JANE JONES kids SingleValue 0 JENNY
> {code}
> Then Mary has ANOTHER kid, which demonstrates the partially static column
> relative to the cluster key, as ALL value rows for the property 'kids' get
> updated to the new value:
> {code}
> INSERT INTO t (entityID, propertyName, propertyType, valueIndex,data) VALUES
> ('0001','kids','MultiValue','1','HARRY');
> {code}
> {code}
> SELECT * FROM t:
> 0001 MARY SMITH married SingleValue 0 TRUE
> 0001 MARY SMITH kids MultiValue null
> 0001 MARY SMITH kids MultiValue 0 JIM-BOB
> 0001 MARY SMITH kids MultiValue 1 HARRY
> 0002 JANE JONES married SingleValue 0 TRUE
> 0002 JANE JONES kids SingleValue null
> 0002 JANE JONES kids SingleValue 0 JENNY
> {code}
> ... ok, hopefully that example isn't TOO complicated. Yes, there's a stupid
> hack bug in there with the null/empty row for the kids attribute, but please
> bear with me on that
> Generally speaking, this will aid in flattening / denormalization of
> relational constructs into cassandra-friendly schemas. In the above example
> we are flattening a relational schema of three tables: entity, property, and
> value tables into a single sparse flattened denormalized compound table.
--
This message was sent by Atlassian JIRA
(v6.2#6252)