Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

David Capwell Tue, 13 May 2025 08:23:32 -0700

That method can be backported even if it’s just for this single case. For the vtable I needed something more friendly so made this for that. It’s also used for dedup logic in repair retry works.

I didn’t look closely, what sets force to be true or false? I assumed this was the force flag in repair (as these are repair snapshots) but sounds like that isn’t the case?

Sent from my iPhone

On May 12, 2025, at 11:58 PM, Štefan Miklošovič <[email protected]> wrote:

David,

that "force" you copied from nodetool, that has nothing to do with "force" in these snapshots but I think you already know that.

org.apache.cassandra.repair.RepairJobDesc#determanisticId is only in 4.1+ included, not in 4.0. That's quite a bummer but we could limit the patch for 4.1+ only and leave 4.0 be.

When we do that, then "force" will become redundant and we can perhaps remove it?

How I look at it is that the introduction of "force" flag in TableRepairManager is the result of a developer trying to consolidate the name of snapshots from parent id / session id to both parent id / parent id but there was the realisation of that being problematic because else branch might be invoked multiple times so it would start to clash. So they said "ok and here we just override and take no matter what" but the implementation of that was not finished as it clearly fails.

So if we moved to determanisticId for both branches of if / else and remove force, we should be good? RepairJobDesc#deterministicId is basically unique? I see it is created from parent id, session id, keyspace, table and ranges. I do not think there would ever be two cases of snapshots like that with exact same values.

WDYT?

Regards

On Mon, May 12, 2025 at 6:55 PM David Capwell <[email protected]> wrote:
> "force" can be true / false. When true, then it will not check if a snapshot exists

My mental node for “force” was only to deal with down nodes, and nothing to do with snapshots… so this feels like a weird behavior

@Option(title = "force", name = {"-force", "--force"}, description = "Use -force to filter out down endpoints”)

Our documentation even only calls out dealing with downed enodes…

> a) we will remove existing snapshot and take a new one

So this is racy and can make repairs brittle. The snapshot is at the RepairSession (global) or RepairJob level (a subset of ranges for a single table, different RepairJobs can work on the same table, just different ranges). With this code path you also have 1 more variable at play that makes things complex: isGlobal

public boolean isGlobal() {return dataCenters.isEmpty() && hosts.isEmpty();}

Global does the full range, where as non-global covers the partial range

So when you do host or dc this snapshot is isolated to the RepairJob and not the session logically, but physically its isolated to the session; which doesn’t make any sense.

> 2) if we do not want to fix "force", then it is most probably just redundant and we would remove it but it order to not fail, we would need to go back to naming snapshots not parent id / parent id but parent id / session id for global and non-global repair respectively (basically we would return to pre-14116 behavior).

org.apache.cassandra.repair.RepairJobDesc#determanisticId

This is scoped to a single RepairJob, so wouldn’t have the issues with concurrency. So if we were going to alter what the snapshot name is, I would strongly prefer this (its also present in the repair vtable, so not some hidden UUID)

I personally feel the lowest risk is to switch to the job name and away from the session name we had… I do wonder about removing this “force” semantic as its not documented to users as far as I can tell, so is there any place that defines this behavior?

> we might just fix the trunk and keep the rest broken, not ideal but

With some patches its hard to tell if something is a bug or a feature… so altering the semantic is that making an improvement to allow better combinations of repair options, or is it fixing a bug (aka worthy of a back port)…. I know most users do parallel repair (default) so this snapshot logic doesn’t impact most users, but its also why its been so long before this issue got noticed… so coverage around snapshots currently isn’t the best.

Sorry for this reply, its less of an answer and more a dump while I think through the problem you stated….

> On May 12, 2025, at 9:18 AM, Štefan Miklošovič <[email protected]> wrote:
>
> Hello,
>
> I am looking for advice for (1). There is quite a quality archaeology to go through. I have asked around already (thanks for that!) but nobody seems to ultimately remember / tell what it is for.
>
> There is (2). "force" can be true / false. When true, then it will not check if a snapshot exists. If it exists and "force" is true, this has never worked - that is what CASSANDRA-20490 is for. It would try to hard link twice and it would error out.
>
> This "force" logic is called here (3). If ParentRepairSession (prs) is global, "force" is false, but if it is not (not global means that some dc was specified), then it is true.
>
> Notice that both branches of that if/else use desc.parentSessionId.toString() as snapshots' name.
>
> This "force" logic was introduced in (4) but it seems to me that it never worked. AFAICT this was just broken on delivery.
>
> One commit before (4), it looked like (5). The first branch of "if" used parentSessionId as a snapshot name and it checked if it exists or not before taking it, the else branch took sessionId (not parent session id) and took a snapshot of that name without any check if it exists or not.
>
> From javadoc on TableRepairManager, "force" means: (introduced in (4) CASSANDRA-14116)
>
> "If force is true, a snapshot should be taken even if one already exists with that name.".
>
> Questions:
>
> 1) do we want to fix the behaviour of "force"? (patch 20490)
> if the answer is yes, what is the "semantics" about this. What does "force" mean exactly? It can mean
> a) we will remove existing snapshot and take a new one
> b) we will add more SSTables onto existing snapshot, we just hardlink more SSTables into same snapshot (that is what 20490 is doing)
>
> 2) if we do not want to fix "force", then it is most probably just redundant and we would remove it but it order to not fail, we would need to go back to naming snapshots not parent id / parent id but parent id / session id for global and non-global repair respectively (basically we would return to pre-14116 behavior).
>
> I think the introduction of "force" flag was there because there might be multiple calls to "else" branch in RepairMessageVerbHandler and calling it all over again with parentSessionId would be erroneous when already there, so "force" was there for not caring and just taking a snapshot again. If we go back to sessionId only, then we do not have this problem I guess.
>
> Is it important for people to keep the name of a snapshot in both cases (global / not global) equal to parentSessionId? Can we just go back?
>
> If we want to keep naming snapshots as done now, we might just do 20490, I was trying to fix older branches as well but it is very difficult, we might just fix the trunk and keep the rest broken, not ideal but ... I am just reinventing a wheel in 4.0+, it is just too cumbersome / complex to copy trunk's behaviour in 20490 to older stuff.
>
> Thanks and regards
>
> (1) https://issues.apache.org/jira/browse/CASSANDRA-20490
> (2) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L84
> (3) https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192
> (4) https://issues.apache.org/jira/browse/CASSANDRA-14116
> (5) https://github.com/apache/cassandra/blob/478c1a9fdf027af0f373f9e26521803ae8775fdb/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L106-L121

Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

Reply via email to