[
https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040527#comment-16040527
]
Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/23/17 3:26 AM:
---------------------------------------------------------------------------
{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
cqlsh> SELECT * FROM test.table1;
id | name | enabled | foo
----+------+---------+-----
1 | One | True | Bar
(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv1;
name | id | foo
------+----+-----
(0 rows)
{code}
Logic in ViewUpdateGenerator.java does not update the view row if updated
column is not denormalized in the view.
in the above case,{{enabled}} is not denormalized and so update has not
propagated to the view.View metadata only has pk columns + columns in select
statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that
all non-primary key columns that have filters are denormalized.With this we can
also make sure that {{ALTER TABLE}} does not drop a column that is used in view.
(delete does not do this check because it does not have to. This is the reason
row delete worked when {{enabled}} is set to false.)
{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;
name | id | enabled | foo
------+----+---------+------
One | 1 | True | null
(1 rows)
{code}
This happens because of the way liveliness/deletion info is computed in the
view.
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of
all the columns (including non-pk) in the row and uses it in Deletion info when
deleting.
But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest
timestamp of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones
have higher timestamp than live cell from base row.
I have uploaded a [patch for 3.11 |
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
which fixes above two issues.
I will make patches for other branches if this patch looks okay.
Comments appreciated !
was (Author: krishna.koneru):
{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
cqlsh> SELECT * FROM test.table1;
id | name | enabled | foo
----+------+---------+-----
1 | One | True | Bar
(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv1;
name | id | foo
------+----+-----
(0 rows)
{code}
Logic in ViewUpdateGenerator.java does not update the view row if updated
column is not denormalized in the view.
in the above case,{{enabled}} is not denormalized and so update has not
propagated to the view.View metadata only has pk columns + columns in select
statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that
all non-primary key columns that have filters are denormalized.
(delete does not do this check because it does not have to. This is the reason
row delete worked when {{enabled}} is set to false.)
{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;
name | id | enabled | foo
------+----+---------+------
One | 1 | True | null
(1 rows)
{code}
This happens because of the way liveliness/deletion info is computed in the
view.
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of
all the columns (including non-pk) in the row and uses it in Deletion info when
deleting.
But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest
timestamp of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones
have higher timestamp than live cell from base row.
I have uploaded a [patch for 3.11 |
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
which fixes above two issues.
I will make patches for other branches if this patch looks okay.
Comments appreciated !
> Filtered materialized views missing data
> ----------------------------------------
>
> Key: CASSANDRA-13547
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13547
> Project: Cassandra
> Issue Type: Bug
> Components: Materialized Views
> Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce).
> Reporter: Craig Nicholson
> Assignee: Krishna Dattu Koneru
> Priority: Blocker
> Labels: materializedviews
> Fix For: 3.11.x
>
>
> When creating a materialized view against a base table the materialized view
> does not always reflect the correct data.
> Using the following test schema:
> {code:title=Schema|language=sql}
> DROP KEYSPACE IF EXISTS test;
> CREATE KEYSPACE test
> WITH REPLICATION = {
> 'class' : 'SimpleStrategy',
> 'replication_factor' : 1
> };
> CREATE TABLE test.table1 (
> id int,
> name text,
> enabled boolean,
> foo text,
> PRIMARY KEY (id, name));
> CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo
> FROM test.table1
> WHERE id IS NOT NULL
> AND name IS NOT NULL
> AND enabled = TRUE
> PRIMARY KEY ((name), id);
> CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled
> FROM test.table1
> WHERE id IS NOT NULL
> AND name IS NOT NULL
> AND enabled = TRUE
> PRIMARY KEY ((name), id);
> {code}
> When I insert a row into the base table the materialized views are updated
> appropriately. (+)
> {code:title=Insert row|language=sql}
> cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One',
> TRUE, 'Bar');
> cqlsh> SELECT * FROM test.table1;
> id | name | enabled | foo
> ----+------+---------+-----
> 1 | One | True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
> name | id | foo
> ------+----+-----
> One | 1 | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
> name | id | enabled | foo
> ------+----+---------+-----
> One | 1 | True | Bar
> (1 rows)
> {code}
> Updating the record in the base table and setting enabled to FALSE will
> filter the record from both materialized views. (+)
> {code:title=Disable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
> id | name | enabled | foo
> ----+------+---------+-----
> 1 | One | False | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
> name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
> name | id | enabled | foo
> ------+----+---------+-----
> (0 rows)
> {code}
> However a further update to the base table setting enabled to TRUE should
> include the record in both materialzed views, however only one view
> (table1_mv2) gets updated. (-)
> It appears that only the view (table1_mv2) that returns the filtered column
> (enabled) is updated. (-)
> Additionally columns that are not part of the partiion or clustering key are
> not updated. You can see that the foo column has a null value in table1_mv2.
> (-)
> {code:title=Enable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
> id | name | enabled | foo
> ----+------+---------+-----
> 1 | One | True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
> name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
> name | id | enabled | foo
> ------+----+---------+------
> One | 1 | True | null
> (1 rows)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]