[ https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829898#comment-17829898 ]
Sam Tunnicliffe commented on CASSANDRA-19130: --------------------------------------------- The way truncation works is that it writes a timestamp into a system table on each node, associated with the table being truncated (and a commitlog position). Then, when local reads and writes are done against that table, any cells with a timestamp earlier than the truncation is essentially discarded. If any node misses that message and so doesn't write the timestamp it won't do this filtering and so data can be resurrected. This is a strictly one time operation and there's no way for a node which does miss such a message to catch it up later, which is why {{TRUNCATE}} currently requires all nodes to be up. With TCM, we can improve this by having an entry in the log which contains the truncation timestamp. Then it can be distributed to peers the same way as any other log entry, allowing them to catch up if they miss it. Replicas and coordinators participating in a read already check that they're all up to date with each other attempt to catch up if not. We shouldn't have to change how truncation works on the local level, just have {{TruncateStatement}} work by committing a new transform to the CMS. The trickiest bit will be to make sure that the {{execute}} method itself is side-effect free (i.e. it only produces a new ClusterMetadata). The way to do that is with a {{ChangeListener}} which implements a post-commit event to do the work of {{CFS::truncateBlocking}} > Implement transactional table truncation > ---------------------------------------- > > Key: CASSANDRA-19130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19130 > Project: Cassandra > Issue Type: New Feature > Components: Consistency/Coordination > Reporter: Marcus Eriksson > Assignee: Stefan Miklosovic > Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > TRUNCATE table should leverage cluster metadata to ensure consistent > truncation timestamps across all replicas. The current implementation depends > on all nodes being available, but this could be reimplemented as a > {{Transformation}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org