Paulo Motta created CASSANDRA-13826:
---------------------------------------
Summary: Specialize row structure to support complex Materialized
Views liveness
Key: CASSANDRA-13826
URL: https://issues.apache.org/jira/browse/CASSANDRA-13826
Project: Cassandra
Issue Type: Improvement
Components: Materialized Views
Reporter: Paulo Motta
Differently from an ordinary row, where a row is live if its PK or any column
is live, a view row has different liveness requirements, summarized by
[~jasonstack] on CASSANDRA-11500:
{quote}
1. base pk and view pk are the same (order doesn't matter) and view has no
filter conditions or only conditions on base pk.
(filter condition mean: c = 1 in view's where clause. filter condition is not a
concern here, since no previous view data to be cleared.)
view row exists if any of following is true:
* base row pk has live livenessInfo(timestamp) and base row pk satifies view's
filter conditions if any.
* or one of base row columns selected in view has live timestamp (via update)
and base row pk satifies view's filter conditions if any. this is handled by
existing mechanism of liveness and tombstone since all info are included in
view row
* or one of base row columns not selected in view has live timestamp (via
update) and base row pk satifies view's filter conditions if any. Those
unselected columns' timestamp/ttl/cell-deletion info are not currently stored
on view row.
2. base column used in view pk or view has filter conditions on base non-key
column which can also lead to entire view row being wiped.
view row exists if any of following is true:
* base row pk has live livenessInfo(timestamp) && base column used in view pk
is not null but no timestamp && conditions are satisfied. ( pk having live
livenesInfo means it is not deleted by tombstone)
* or base row column in view pk has timestamp (via update) && conditions are
satisfied. eg. if base column used in view pk is TTLed, entire view row should
be wiped.
{quote}
These additional requirements were overlooked during the original MV design and
caused some problems when base rows or columns are updated or removed,
described on CASSANDRA-13127, CASSANDRA-13409, CASSANDRA-11500 and
CASSANDRA-13409.
On CASSANDRA-11500 we will do some tweaks to the existing mechanism to fix most
of the above issues, except correct support to out-of-order deletion of
unselected column on view sharing partition key components with base and
filtering by non-PK columns. The former is a limitation of the original MV
design and the latter was a relatively recently introduced feature
(CASSANDRA-10368) which has overlooked this requirement and will be reverted on
CASSANDRA-13798.
This ticket is to go back to the drawing board and discuss and implement a
storage engine extension to properly support the following cases:
- Out-of-order deletion of unselected column on view sharing partition key
components with base ([ignored
test|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewTest.java#L87])
- Filtering by non-primary key and/or unselected columns (Follow-up
CASSANDRA-13798, [ignored
tests|https://github.com/apache/cassandra/blob/add5face50f2eccbc1a53e0fe22e2d79ba856db1/test/unit/org/apache/cassandra/cql3/ViewFilteringTest.java#L88])
- Rethink shadowable tombstone mechanism and remove workarounds introduced by
CASSANDRA-11500, such as using expired liveness info to represent commutative
deletion
([TODO|https://github.com/apache/cassandra/blob/e0da138ab10f6c0fc014de86fb251e11358d80cc/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java#L429]).
- Add support to dropping unselected columns on base table and reflect that on
views ([commented
test|https://github.com/apache/cassandra/commit/add5face50f2eccbc1a53e0fe22e2d79ba856db1])
- Upgrade from the previous to the new structure
Zhao virtual cells proposal from CASSANDRA-11500 is probably a good starting
point, but we need to discuss it and validate to make sure it's efficient,
making adequate reuse of existing structures and not introducing unnecessary
complexity in the storage engine which we'll have to be responsible for in the
future. In addition to this we should probably contemplate supporting multiple
non-PK cols in MV clustering (CASSANDRA-10226) which introduces additional
liveness requirements for views in addition to the ones mentioned above, or
other simplifications we can make to the view row structure.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]