Hello,

I am looking for advice for (1). There is quite a quality archaeology to go
through. I have asked around already (thanks for that!) but nobody seems to
ultimately remember / tell what it is for.

There is (2). "force" can be true / false. When true, then it will not
check if a snapshot exists. If it exists and "force" is true, this has
never worked - that is what CASSANDRA-20490 is for. It would try to hard
link twice and it would error out.

This "force" logic is called here (3). If ParentRepairSession (prs)  is
global, "force" is false, but if it is not (not global means that some dc
was specified), then it is true.

Notice that both branches of that if/else use
desc.parentSessionId.toString() as snapshots' name.

This "force" logic was introduced in (4) but it seems to me that it never
worked. AFAICT this was just broken on delivery.

One commit before (4), it looked like (5). The first branch of "if" used
parentSessionId as a snapshot name and it checked if it exists or not
before taking it, the else branch took sessionId (not parent session id)
and took a snapshot of that name without any check if it exists or not.

>From javadoc on TableRepairManager, "force" means: (introduced in (4)
CASSANDRA-14116)

"If force is true, a snapshot should be taken even if one already exists
with that name.".

Questions:

1) do we want to fix the behaviour of "force"? (patch 20490)
  if the answer is yes, what is the "semantics" about this. What does
"force" mean exactly? It can mean
  a) we will remove existing snapshot and take a new one
  b) we will add more SSTables onto existing snapshot, we just hardlink
more SSTables into same snapshot (that is what 20490 is doing)

2) if we do not want to fix "force", then it is most probably just
redundant and we would remove it but it order to not fail, we would need to
go back to naming snapshots not parent id / parent id but parent id /
session id for global and non-global repair respectively (basically we
would return to pre-14116 behavior).

I think the introduction of "force" flag was there because there might be
multiple calls to "else" branch in RepairMessageVerbHandler and calling it
all over again with parentSessionId would be erroneous when already there,
so "force" was there for not caring and just taking a snapshot again. If we
go back to sessionId only, then we do not have this problem I guess.

Is it important for people to keep the name of a snapshot in both cases
(global / not global) equal to parentSessionId? Can we just go back?

If we want to keep naming snapshots as done now, we might just do 20490, I
was trying to fix older branches as well but it is very difficult, we might
just fix the trunk and keep the rest broken, not ideal but ... I am just
reinventing a wheel in 4.0+, it is just too cumbersome / complex to copy
trunk's behaviour in 20490 to older stuff.

Thanks and regards

(1) https://issues.apache.org/jira/browse/CASSANDRA-20490
(2)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/repair/CassandraTableRepairManager.java#L84
(3)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L185-L192
(4) https://issues.apache.org/jira/browse/CASSANDRA-14116
(5)
https://github.com/apache/cassandra/blob/478c1a9fdf027af0f373f9e26521803ae8775fdb/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java#L106-L121

Reply via email to