Matt Byrd created CASSANDRA-21173:
-------------------------------------
Summary: Snapshots from tables without table-id embedded in their
folder name are not loaded by SnapshotLoader
Key: CASSANDRA-21173
URL: https://issues.apache.org/jira/browse/CASSANDRA-21173
Project: Apache Cassandra
Issue Type: Bug
Components: Local/Snapshots, Local/Startup and Shutdown
Reporter: Matt Byrd
Assignee: Matt Byrd
Tables created prior to 2.1 do not have a table-id embedded in their table
folder name.
This is handled correctly in Directories.java (see constructor) unfortunately
in SnapshotLoader, we use a regex which attempts to extract the table-id and
hence skips over any tables created prior to 2.1.
The end result is that these tables are not visible in list snapshot and more
importantly cannot be cleared via nodetool clearsnapshot. This was noticed upon
major upgrade to 5.0.
I've observed this on 5.0, from reading the code it appears likely improved in
5.1, in that it now requires a restart in addition to trigger.
Some related tickets:
Introduction of table-id and backwards compatible handling of old folders
originally here:
https://issues.apache.org/jira/browse/CASSANDRA-5202
Machinery to list snapshots which doesn’t handle old format was added here:
https://issues.apache.org/jira/browse/CASSANDRA-16843
https://github.com/apache/cassandra/commit/31aa17a2a3b18bdda723123cad811f075287807d
There was some discussion at the time of not handling pre 2.1 tables here:
https://issues.apache.org/jira/browse/CASSANDRA-16843?focusedCommentId=17440088&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17440088
Then nodetool clearsnapshot stopped working here with:
https://issues.apache.org/jira/browse/CASSANDRA-17757
Things improve a bit in 5.1 with
https://issues.apache.org/jira/browse/CASSANDRA-18111
Now we no longer try and load the snapshots via SnapshotLoader in entirety
before deciding if we can clear them, but instead make use of SnapshotManager.
Whilst snapshots taken while the jvm is running are now visible and clearable,
from reading upon restart we lose that information and cannot view/clear
snapshots created before the restart.
One solution to handle these pre 2.1 tables, is to include the table-id in the
manifest.json, then we'll be able to read this information if not available
from folder name upon restart.
Another possibility which doesn't fix as many problems, is just to expose via
jmx/nodetool
something to allow operators to bypass the snapshot loading mechanism and
directly clear the old pre-2.1 snapshots.
A more involved and risky change would be to somehow think about how we migrate
all this existing data in different folder structures to new consistent folder
structure, but this seems quite involved and would likely deserve it's own JIRA
at least.
I have a patch locally against trunk for the first approach, just storing the
tableId in the manifest.json, which does this and will run it against CI.
I'll have a further think about if there are any other approaches, if anyone
has any ideas let me know.
Another thing to consider is where we should apply this change.
Probably at a minimum 5.0, since that's where one can no longer nodetool
clearsnapshot on certain tables and the effect is a bit worse there than in 5.1.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]