[ 
https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829898#comment-17829898
 ] 

Sam Tunnicliffe commented on CASSANDRA-19130:
---------------------------------------------

The way truncation works is that it writes a timestamp into a system table on 
each node, associated with the table being truncated (and a commitlog 
position). Then, when local reads and writes are done against that table, any 
cells with a timestamp earlier than the truncation is essentially discarded. If 
any node misses that message and so doesn't write the timestamp it won't do 
this filtering and so data can be resurrected. This is a strictly one time 
operation and there's no way for a node which does miss such a message to catch 
it up later, which is why {{TRUNCATE}} currently requires all nodes to be up.

With TCM, we can improve this by having an entry in the log which contains the 
truncation timestamp.  Then it can be distributed to peers the same way as any 
other log entry, allowing them to catch up if they miss it. Replicas and 
coordinators participating in a read already check that they're all up to date 
with each other attempt to catch up if not.

We shouldn't have to change how truncation works on the local level, just have 
{{TruncateStatement}} work by committing a new transform to the CMS. The 
trickiest bit will be to make sure that the {{execute}} method itself is 
side-effect free (i.e. it only produces a new ClusterMetadata). The way to do 
that is with a {{ChangeListener}} which implements a post-commit event to do 
the work of {{CFS::truncateBlocking}}

> Implement transactional table truncation
> ----------------------------------------
>
>                 Key: CASSANDRA-19130
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19130
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Consistency/Coordination
>            Reporter: Marcus Eriksson
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> TRUNCATE table should leverage cluster metadata to ensure consistent 
> truncation timestamps across all replicas. The current implementation depends 
> on all nodes being available, but this could be reimplemented as a 
> {{Transformation}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to