[
https://issues.apache.org/jira/browse/CASSANDRA-18105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17714175#comment-17714175
]
Stefan Miklosovic edited comment on CASSANDRA-18105 at 4/19/23 3:43 PM:
------------------------------------------------------------------------
Together with great help of [~samt] we found the problem. Basically, upon
dropping of an index, it will eventually call (1) but the problem is that id of
index is same as id of the base table. So it will remove the record from the
truncate_at map in system.local for the base table. So TRUNCATE will put that
record there but next DROP of index will remove it from there.
If you notice, index has same id as base table because of this (2)
It was said to me that there is some reason behind the sharing of the id
between base table and the index but we should probably revisit this decision.
I am personally not sure why it is done like that.
The fix consists of simple check to not remove the trucated_at entry when table
metadata is of an index:
{code:java}
if (!metadata.get().isIndex())
SystemKeyspace.removeTruncationRecord(metadata.id);
{code}
It is also worth to mention that this is not happening without restarting the
node because upon restart, the commit log is replayed and it will look into
this table to see if a table was truncated so it will not replay the mutations.
However, since there is no such record in that truncated_at map for that table
anymore as DROP INDEX removed it, it will just replay it all so data will
resurrect.
(1)
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L695]
(2)
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/internal/CassandraIndex.java#L739]
was (Author: smiklosovic):
Together with great help of [~samt] we found the problem. Basically, upon
dropping of an index, it will eventually call (1) but the problem is that id of
index is same as id of the base table. So it will remove the record from the
truncate_at map in system.local for the base table. So TRUNCATE will put that
record there but next DROP of index will remove it from there.
If you notice, index has same id as base table because of this (2)
It was said to me that there is some reason behind the sharing of the id
between base table and the index but we should probably revisit this decision.
I am personally not sure why it is done like that.
The fix consists of simple check to not remove the trucated_at entry when table
metadata is of an index:
{code:java}
if (!metadata.get().isIndex())
SystemKeyspace.removeTruncationRecord(metadata.id);
{code}
It is also worth to mention that this is not happening without restarting the
node because upon restart, the commit log is replayed and it will look into
this table to see where if it was truncated so it will not replay the. However,
since there is no such record in that map for that table anymore as DROP INDEX
removed it, it will just replay it all so data will resurrect.
(1)
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L695]
(2)
[https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/internal/CassandraIndex.java#L739]
> TRUNCATED data come back after a restart or upgrade
> ---------------------------------------------------
>
> Key: CASSANDRA-18105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18105
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/2i Index
> Reporter: Ke Han
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.0.x, 4.1.x, 5.x
>
>
> When we use the TRUNCATE command to delete all data in the table, the deleted
> data come back after a node restart or upgrade. This problem happens at the
> latest releases (2.2.19, 3.0.28, or 4.0.7)
> h1. Steps to reproduce
> h2. To reproduce it at release (3.0.28 or 4.0.7)
> Start up a single Cassandra node. Using the default configuration and execute
> the following cqlsh commands.
> {code:java}
> CREATE KEYSPACE IF NOT EXISTS ks WITH REPLICATION = { 'class' :
> 'SimpleStrategy', 'replication_factor' : 1 };
> CREATE TABLE ks.tb (c3 TEXT,c4 TEXT,c2 INT,c1 TEXT, PRIMARY KEY (c1, c2, c3
> ));
> INSERT INTO ks.tb (c3, c1, c2) VALUES ('val1','val2',1);
> CREATE INDEX IF NOT EXISTS tb ON ks.tb ( c3);
> TRUNCATE TABLE ks.tb;
> DROP INDEX IF EXISTS ks.tb; {code}
> Execute a read command
> {code:java}
> cqlsh> SELECT c2 FROM ks.tb;
> c2
> ----
> (0 rows) {code}
> Then, we flush the node and kill the Cassandra daemon by
> {code:java}
> bin/nodetool flush
> pgrep -f cassandra | xargs kill -9 {code}
> We restart the node. When the node has started, perform the same read, and
> the deleted data comes back again.
> {code:java}
> cqlsh> SELECT c2 FROM ks.tb;
> c2
> ----
> 1
> (1 rows) {code}
> h2. To reproduce it at release (2.2.19)
> We don't need to kill the Cassandra daemon. Use bin/nodetool stopdaemon is
> enough. The other steps are the same as reproducing it at 4.0.7 or 3.0.28.
> {code:java}
> bin/nodetool -h ::FFFF:127.0.0.1 flush
> bin/nodetool -h ::FFFF:127.0.0.1 stopdaemon{code}
>
> I have put the full log to reproduce it for release 4.0.7 and 2.2.19 in the
> comments.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]