Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

Jaydeep Chovatia Tue, 13 May 2025 21:07:07 -0700

>So if we moved to determanisticId for both branches of if / else and
remove force, we should be good? RepairJobDesc#deterministicId is basically
unique?
The challenge with using RepairJobDesc#deterministicId in the SNAPSHOT_MSG is
that it will create N snapshots for N token-splits for a table. However,
CLEANUP_MSG is only invoked once for a table, complicating the cleanup of
the snapshots.


Jaydeep



On Tue, May 13, 2025 at 8:56 AM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> That true / false for "force" argument seems to be purely derived from
> whether it is global repair (force set to false) or not (set to true).
>
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192
>
> On Tue, May 13, 2025 at 5:25 PM David Capwell <dcapw...@apple.com> wrote:
>
>> That method can be backported even if it’s just for this single case. For
>> the vtable I needed something more friendly so made this for that. It’s
>> also used for dedup logic in repair retry works.
>>
>> I didn’t look closely, what sets force to be true or false? I assumed
>> this was the force flag in repair (as these are repair snapshots) but
>> sounds like that isn’t the case?
>>
>> Sent from my iPhone
>>
>> On May 12, 2025, at 11:58 PM, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>> 
>> David,
>>
>> that "force" you copied from nodetool, that has nothing to do with
>> "force" in these snapshots but I think you already know that.
>>
>> org.apache.cassandra.repair.RepairJobDesc#determanisticId is only in 4.1+
>> included, not in 4.0. That's quite a bummer but we could limit the patch
>> for 4.1+ only and leave 4.0 be.
>>
>> When we do that, then "force" will become redundant and we can perhaps
>> remove it?
>>
>> How I look at it is that the introduction of "force" flag in
>> TableRepairManager is the result of a developer trying to consolidate the
>> name of snapshots from parent id / session id to both parent id / parent id
>> but there was the realisation of that being problematic because else branch
>> might be invoked multiple times so it would start to clash. So they said
>> "ok and here we just override and take no matter what" but the
>> implementation of that was not finished as it clearly fails.
>>
>> So if we moved to determanisticId for both branches of if / else and
>> remove force, we should be good? RepairJobDesc#deterministicId is basically
>> unique? I see it is created from parent id, session id, keyspace, table and
>> ranges. I do not think there would ever be two cases of snapshots like that
>> with exact same values.
>>
>> WDYT?
>>
>> Regards
>>
>> On Mon, May 12, 2025 at 6:55 PM David Capwell <dcapw...@apple.com> wrote:
>>
>>> > "force" can be true / false. When true, then it will not check if a
>>> snapshot exists
>>>
>>>
>>> My mental node for “force” was only to deal with down nodes, and nothing
>>> to do with snapshots… so this feels like a weird behavior
>>>
>>> @Option(title = "force", name = {"-force", "--force"}, description =
>>> "Use -force to filter out down endpoints”)
>>>
>>> Our documentation even only calls out dealing with downed enodes…
>>>
>>>
>>> >   a) we will remove existing snapshot and take a new one
>>>
>>> So this is racy and can make repairs brittle.  The snapshot is at the
>>> RepairSession (global) or RepairJob level (a subset of ranges for a single
>>> table, different RepairJobs can work on the same table, just different
>>> ranges). With this code path you also have 1 more variable at play that
>>> makes things complex: isGlobal
>>>
>>> public boolean isGlobal() {return dataCenters.isEmpty() &&
>>> hosts.isEmpty();}
>>>
>>> Global does the full range, where as non-global covers the partial range
>>>
>>> So when you do host or dc this snapshot is isolated to the RepairJob and
>>> not the session logically, but physically its isolated to the session;
>>> which doesn’t make any sense.
>>>
>>> > 2) if we do not want to fix "force", then it is most probably just
>>> redundant and we would remove it but it order to not fail, we would need to
>>> go back to naming snapshots not parent id / parent id but parent id /
>>> session id for global and non-global repair respectively (basically we
>>> would return to pre-14116 behavior).
>>>
>>> org.apache.cassandra.repair.RepairJobDesc#determanisticId
>>>
>>> This is scoped to a single RepairJob, so wouldn’t have the issues with
>>> concurrency.  So if we were going to alter what the snapshot name is, I
>>> would strongly prefer this (its also present in the repair vtable, so not
>>> some hidden UUID)
>>>
>>> I personally feel the lowest risk is to switch to the job name and away
>>> from the session name we had… I do wonder about removing this “force”
>>> semantic as its not documented to users as far as I can tell, so is there
>>> any place that defines this behavior?
>>>
>>> > we might just fix the trunk and keep the rest broken, not ideal but
>>>
>>> With some patches its hard to tell if something is a bug or a feature…
>>> so altering the semantic is that making an improvement to allow better
>>> combinations of repair options, or is it fixing a bug (aka worthy of a back
>>> port)…. I know most users do parallel repair (default) so this snapshot
>>> logic doesn’t impact most users, but its also why its been so long before
>>> this issue got noticed… so coverage around snapshots currently isn’t the
>>> best.
>>>
>>> Sorry for this reply, its less of an answer and more a dump while I
>>> think through the problem you stated….
>>>
>>> > On May 12, 2025, at 9:18 AM, Štefan Miklošovič <smikloso...@apache.org>
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > I am looking for advice for (1). There is quite a quality archaeology
>>> to go through. I have asked around already (thanks for that!) but nobody
>>> seems to ultimately remember / tell what it is for.
>>> >
>>> > There is (2). "force" can be true / false. When true, then it will not
>>> check if a snapshot exists. If it exists and "force" is true, this has
>>> never worked - that is what CASSANDRA-20490 is for. It would try to hard
>>> link twice and it would error out.
>>> >
>>> > This "force" logic is called here (3). If ParentRepairSession (prs)
>>> is global, "force" is false, but if it is not (not global means that some
>>> dc was specified), then it is true.
>>> >
>>> > Notice that both branches of that if/else use
>>> desc.parentSessionId.toString() as snapshots' name.
>>> >
>>> > This "force" logic was introduced in (4) but it seems to me that it
>>> never worked. AFAICT this was just broken on delivery.
>>> >
>>> > One commit before (4), it looked like (5). The first branch of "if"
>>> used parentSessionId as a snapshot name and it checked if it exists or not
>>> before taking it, the else branch took sessionId (not parent session id)
>>> and took a snapshot of that name without any check if it exists or not.
>>> >
>>> > From javadoc on TableRepairManager, "force" means: (introduced in (4)
>>> CASSANDRA-14116)
>>> >
>>> > "If force is true, a snapshot should be taken even if one already
>>> exists with that name.".
>>> >
>>> > Questions:
>>> >
>>> > 1) do we want to fix the behaviour of "force"? (patch 20490)
>>> >   if the answer is yes, what is the "semantics" about this. What does
>>> "force" mean exactly? It can mean
>>> >   a) we will remove existing snapshot and take a new one
>>> >   b) we will add more SSTables onto existing snapshot, we just
>>> hardlink more SSTables into same snapshot (that is what 20490 is doing)
>>> >
>>> > 2) if we do not want to fix "force", then it is most probably just
>>> redundant and we would remove it but it order to not fail, we would need to
>>> go back to naming snapshots not parent id / parent id but parent id /
>>> session id for global and non-global repair respectively (basically we
>>> would return to pre-14116 behavior).
>>> >
>>> > I think the introduction of "force" flag was there because there might
>>> be multiple calls to "else" branch in RepairMessageVerbHandler and calling
>>> it all over again with parentSessionId would be erroneous when already
>>> there, so "force" was there for not caring and just taking a snapshot
>>> again. If we go back to sessionId only, then we do not have this problem I
>>> guess.
>>> >
>>> > Is it important for people to keep the name of a snapshot in both
>>> cases (global / not global) equal to parentSessionId? Can we just go back?
>>> >
>>> > If we want to keep naming snapshots as done now, we might just do
>>> 20490, I was trying to fix older branches as well but it is very difficult,
>>> we might just fix the trunk and keep the rest broken, not ideal but ... I
>>> am just reinventing a wheel in 4.0+, it is just too cumbersome / complex to
>>> copy trunk's behaviour in 20490 to older stuff.
>>> >
>>> > Thanks and regards
>>> >
>>> > (1) https://issues.apache.org/jira/browse/CASSANDRA-20490
>>> > (2)
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L84
>>> > (3)
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192
>>> > (4) https://issues.apache.org/jira/browse/CASSANDRA-14116
>>> > (5)
>>> https://github.com/apache/cassandra/blob/478c1a9fdf027af0f373f9e26521803ae8775fdb/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L106-L121
>>>
>>>

Re: CASSANDRA-20490 Question about broken forced ephemeral snapshots during repair

Reply via email to