Sylvain Lebresne created CASSANDRA-13910:
--------------------------------------------

             Summary: Consider deprecating (then removing) 
read_repair_chance/dclocal_read_repair_chance
                 Key: CASSANDRA-13910
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13910
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
            Priority: Minor


First, let me clarify so this is not misunderstood that I'm not *at all* 
suggesting to remove the read-repair mechanism of detecting and repairing 
inconsistencies between read responses: that mechanism is imo fine and useful.  
But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} have never 
been about _enabling_ that mechanism, they are about querying all replicas 
(even when this is not required by the consistency level) for the sole purpose 
of maybe read-repairing some of the replica that wouldn't have been queried 
otherwise. Which btw, bring me to reason 1 for considering their removal: their 
naming/behavior is super confusing. Over the years, I've seen countless users 
(and not only newbies) misunderstanding what those options do, and as a 
consequence misunderstand when read-repair itself was happening.

But my 2nd reason for suggesting this is that I suspect 
{{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially nowadays, 
more harmful than anything else when enabled. When those option kick in, what 
you trade-off is additional resources consumption (all nodes have to execute 
the read) for a _fairly remote chance_ of having some inconsistencies repaired 
on _some_ replica _a bit faster_ than they would otherwise be. To justify that 
last part, let's recall that:
# most inconsistencies are actually fixed by hints in practice; and in the case 
where a node stay dead for a long time so that hints ends up timing-out, you 
really should repair the node when it comes back (if not simply 
re-bootstrapping it).  Read-repair probably don't fix _that_ much stuff in the 
first place.
# again, read-repair do happen without those options kicking in. If you do 
reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all the 
same.  Just a tiny bit less quickly.
# I suspect almost everyone use a low "chance" for those options at best 
(because the extra resources consumption is real), so at the end of the day, 
it's up to chance how much faster this fixes inconsistencies.

Overall, I'm having a hard time imagining real cases where that trade-off 
really make sense. Don't get me wrong, those options had their places a long 
time ago when hints weren't working all that well, but I think they bring more 
confusion than benefits now.

And I think it's sane to reconsider stuffs every once in a while, and to clean 
up anything that may not make all that much sense anymore, which I think is the 
case here.

Tl;dr, I feel the benefits brought by those options are very slim at best and 
well overshadowed by the confusion they bring, and not worth maintaining the 
code that supports them (which, to be fair, isn't huge, but getting rid of 
{{ReadCallback.AsyncRepairRunner) wouldn't hurt for instance).

Lastly, if the consensus here ends up being that they can have their use in 
weird case and that we fill supporting those cases is worth confusing everyone 
else and maintaining that code, I would still suggest disabling them totally by 
default.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to