[jira] [Created] (CASSANDRA-13826) Specialize row structure to support complex Materialized Views liveness

Paulo Motta (JIRA) Wed, 30 Aug 2017 03:30:44 -0700

Paulo Motta created CASSANDRA-13826:
---------------------------------------


             Summary: Specialize row structure to support complex Materialized 
Views liveness
                 Key: CASSANDRA-13826
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13826
             Project: Cassandra
          Issue Type: Improvement
          Components: Materialized Views
            Reporter: Paulo Motta


Differently from an ordinary row, where a row is live if its PK or any column 
is live, a view row has different liveness requirements, summarized by 
[~jasonstack] on CASSANDRA-11500:

{quote}
1. base pk and view pk are the same (order doesn't matter) and view has no 
filter conditions or only conditions on base pk.
(filter condition mean: c = 1 in view's where clause. filter condition is not a 
concern here, since no previous view data to be cleared.)

view row exists if any of following is true:
* base row pk has live livenessInfo(timestamp) and base row pk satifies view's 
filter conditions if any.
* or one of base row columns selected in view has live timestamp (via update) 
and base row pk satifies view's filter conditions if any. this is handled by 
existing mechanism of liveness and tombstone since all info are included in 
view row
* or one of base row columns not selected in view has live timestamp (via 
update) and base row pk satifies view's filter conditions if any. Those 
unselected columns' timestamp/ttl/cell-deletion info are not currently stored 
on view row.

2. base column used in view pk or view has filter conditions on base non-key 
column which can also lead to entire view row being wiped.

view row exists if any of following is true:

* base row pk has live livenessInfo(timestamp) && base column used in view pk 
is not null but no timestamp && conditions are satisfied. ( pk having live 
livenesInfo means it is not deleted by tombstone)
* or base row column in view pk has timestamp (via update) && conditions are 
satisfied. eg. if base column used in view pk is TTLed, entire view row should 
be wiped.
{quote}

These additional requirements were overlooked during the original MV design and 
caused some problems when base rows or columns are updated or removed, 
described on CASSANDRA-13127, CASSANDRA-13409, CASSANDRA-11500 and 
CASSANDRA-13409.

On CASSANDRA-11500 we will do some tweaks to the existing mechanism to fix most 
of the above issues, except correct support to out-of-order deletion of 
unselected column on view sharing partition key components with base and 
filtering by non-PK columns. The former is a limitation of the original MV 
design and the latter was a relatively recently introduced feature 
(CASSANDRA-10368) which has overlooked this requirement and will be reverted on 
CASSANDRA-13798.

This ticket is to go back to the drawing board and discuss and implement a 
storage engine extension to properly support the following cases:
- Out-of-order deletion of unselected column on view sharing partition key 
components with base ([ignored 
test|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewTest.java#L87])
- Filtering by non-primary key and/or unselected columns (Follow-up 
CASSANDRA-13798, [ignored 
tests|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewFilteringTest.java#L88])
- Rethink shadowable tombstone mechanism and remove workarounds introduced by 
CASSANDRA-11500, such as using expired liveness info to represent commutative 
deletion 
([TODO|https://github.com/apache/cassandra/blob/e0da138ab10f6c0fc014de86fb251e11358d80cc/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java#L429]).
- Add support to dropping unselected columns on base table and reflect that on 
views ([commented 
test|https://github.com/apache/cassandra/commit/add5face50f2eccbc1a53e0fe22e2d79ba856db1])
- Upgrade from the previous to the new structure

Zhao virtual cells proposal from CASSANDRA-11500 is probably a good starting 
point, but we need to discuss it and validate to make sure it's efficient, 
making adequate reuse of existing structures and not introducing unnecessary 
complexity in the storage engine which we'll have to be responsible for in the 
future. In addition to this we should probably contemplate supporting multiple 
non-PK cols in MV clustering (CASSANDRA-10226) which introduces additional 
liveness requirements for views in addition to the ones mentioned above, or 
other simplifications we can make to the view row structure.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (CASSANDRA-13826) Specialize row structure to support complex Materialized Views liveness

Reply via email to