[ 
https://issues.apache.org/jira/browse/CASSANDRA-13547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040527#comment-16040527
 ] 

Krishna Dattu Koneru edited comment on CASSANDRA-13547 at 6/21/17 6:22 AM:
---------------------------------------------------------------------------


{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';

cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
{code}
{code:title=Problem 1 - Missing Updates|borderStyle=solid}

cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----

(0 rows)
{code}

Logic in ViewUpdateGenerator.java does not update the view row if updated 
column is not denormalized in the view.
in the above case,{{enabled}} is not denormalized and so update has not 
propagated to the view.View metadata only has pk columns + columns in select 
statement of create view.
Now that filtering on non-pk columns is supported , we have to make sure that 
all non-primary key columns that have filters are denormalized.
(delete does not do this check because it does not have to. This is the reason 
row delete worked when {{enabled}} is set to false.)

{code:title=Problem 2 - incorrect non-pk tombstones|borderStyle=solid}
cqlsh> SELECT * FROM test.table1_mv2;

 name | id | enabled | foo
------+----+---------+------
  One |  1 |    True | null

(1 rows)
{code}

This happens because of the way liveliness/deletion info  is computed in the 
view. 
{{computeTimestampForEntryDeletion())}} method takes the biggest timestamp of 
all the columns (including non-pk) in the row and uses it in Deletion info when 
deleting.

But,when inserting/updating, {{computeLivenessInfoForEntry()}} uses the biggest 
timestamp of the primary keys for liveliness info.
This causes non-pk columns to be treated as deleted because view tombstones 
have higher timestamp than live cell from base row.


I have uploaded a [patch for 3.11 | 
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547]branch
 which fixes above two issues.

I will make patches for other branches if this patch looks okay.
Comments appreciated !


was (Author: krishna.koneru):
I have uploaded a patch (with unittest) for problem 1 in above comment @ 
https://github.com/apache/cassandra/compare/cassandra-3.11...krishna-koneru:cassandra-3.11-13547

below problem addressed by this patch  (from the description)
{code}
cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
cqlsh> SELECT * FROM test.table1;

 id | name | enabled | foo
----+------+---------+-----
  1 |  One |    True | Bar

(1 rows)
cqlsh> SELECT * FROM test.table1_mv1;

 name | id | foo
------+----+-----
{code}

This happens because view metadata does not have column "enabled" .
The decision "should this base table update, update view as well?" is made by 
looking at view metadata (among other things).
Due to this, some updates to base table do not result in updates to MV even if 
update qualifies .

This patch disallows usage of any non PK columns in WHERE clause if they are 
not in SELECT list.

Comments are appreciated !


> Filtered materialized views missing data
> ----------------------------------------
>
>                 Key: CASSANDRA-13547
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13547
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Materialized Views
>         Environment: Official Cassandra 3.10 Docker image (ID 154b919bf8ce).
>            Reporter: Craig Nicholson
>            Assignee: Krishna Dattu Koneru
>            Priority: Blocker
>              Labels: materializedviews
>             Fix For: 3.11.x
>
>
> When creating a materialized view against a base table the materialized view 
> does not always reflect the correct data.
> Using the following test schema:
> {code:title=Schema|language=sql}
> DROP KEYSPACE IF EXISTS test;
> CREATE KEYSPACE test
>   WITH REPLICATION = { 
>    'class' : 'SimpleStrategy', 
>    'replication_factor' : 1 
>   };
> CREATE TABLE test.table1 (
>                 id int,
>                 name text,
>                 enabled boolean,
>                 foo text,
>                 PRIMARY KEY (id, name));
> CREATE MATERIALIZED VIEW test.table1_mv1 AS SELECT id, name, foo
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> CREATE MATERIALIZED VIEW test.table1_mv2 AS SELECT id, name, foo, enabled
>                 FROM test.table1
>                 WHERE id IS NOT NULL 
>                 AND name IS NOT NULL 
>                 AND enabled = TRUE
>                 PRIMARY KEY ((name), id);
> {code}
> When I insert a row into the base table the materialized views are updated 
> appropriately. (+)
> {code:title=Insert row|language=sql}
> cqlsh> INSERT INTO test.table1 (id, name, enabled, foo) VALUES (1, 'One', 
> TRUE, 'Bar');
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
>   One |  1 | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
>   One |  1 |    True | Bar
> (1 rows)
> {code}
> Updating the record in the base table and setting enabled to FALSE will 
> filter the record from both materialized views. (+)
> {code:title=Disable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = FALSE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |   False | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+-----
> (0 rows)
> {code}
> However a further update to the base table setting enabled to TRUE should 
> include the record in both materialzed views, however only one view 
> (table1_mv2) gets updated. (-)
> It appears that only the view (table1_mv2) that returns the filtered column 
> (enabled) is updated. (-)
> Additionally columns that are not part of the partiion or clustering key are 
> not updated. You can see that the foo column has a null value in table1_mv2. 
> (-)
> {code:title=Enable the row|language=sql}
> cqlsh> UPDATE test.table1 SET enabled = TRUE WHERE id = 1 AND name = 'One';
> cqlsh> SELECT * FROM test.table1;
>  id | name | enabled | foo
> ----+------+---------+-----
>   1 |  One |    True | Bar
> (1 rows)
> cqlsh> SELECT * FROM test.table1_mv1;
>  name | id | foo
> ------+----+-----
> (0 rows)
> cqlsh> SELECT * FROM test.table1_mv2;
>  name | id | enabled | foo
> ------+----+---------+------
>   One |  1 |    True | null
> (1 rows)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to