[
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17047065#comment-17047065
]
David Capwell commented on CASSANDRA-15564:
-------------------------------------------
Thanks [~bdeggleston]!
* I am good with exception over Either, ill make that change soon.
* Retry has come up for me in the context of tests, so I don't mind moving it
into a test package. I see test/unit and test/distributed are in the same
compile context (unit classes invoke distributed classes), so I should be able
to put this in test/unit
* "Usage of final" yeah, I added them because [~djoshi] asked me to. [~djoshi]
thoughts?
Clarification:
bq. For example, initial table filtering could have its own method, calculating
neighbors and common ranges could have it's own method, etc, etc. This would
make runMayThrow a lot easier to understand.
so we are on the same page, you are saying something like the below example?
{code}
void run()
{
try
{
// all methods rely on mutable state, so update fields
// setup
validateInput();
calculateNeighbors();
groupCommonRanges();
...
// start
repair().join();
}
catch (Exception e)
{
failure(e);
}
}
{code}
I am ok with that. I mostly did it this way for two reasons
1) attempt to remove mutable state (tracing was a problem, didn't go as well as
I hoped...)
2) group based off "life-cycle", the core reason for this was to make it clear
when we do a state change so the parent JIRA could track it.
bq. I don't understand your comment "Why is this before start when its
publishing the start event? For backwards compatability (...)" what's incorrect
about logging that we're starting the repair command there?
Sorry, its bace to the "life-cycle" bit. I personally classified all the logic
there as "setup" and have start happening when we call
SystemDistributedKeyspace.startParentRepair and
ActiveRepairService.instance.prepareForRepair. In the other JIRA (this code is
extracted from that one to make it smaller), I keep track of each time we
transition between life cycle phases and I classified those two methods as the
actual time we do "start".
I moved the event/log into the start method, but found out that python dtest
was failing because it expected a different order, by moving it back here the
tests were passing again. So it was more "why do we send the START notification
before we are in the START phase?", so was trying to call out to any future
reader that this is there at that point in the code for compatibility; mostly
python dtests.
> Refactor repair coordinator so errors are consistent
> ----------------------------------------------------
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
> Issue Type: Sub-task
> Components: Consistency/Repair
> Reporter: David Capwell
> Assignee: David Capwell
> Priority: Normal
> Labels: pull-request-available
> Time Spent: 10h 50m
> Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated
> out.
> Currently the repair coordinator special cases the exit cases at each call
> site; this makes it so that errors can be inconsistent and there are cases
> where proper complete isn't done (proper notifications, and forgetting to
> update ActiveRepairService).
> [Circle
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]