Hi everyone, I just posted a proposed solution to some issues with incremental repair in CASSANDRA-9143. The solution involves non-trivial changes to the way incremental repair works, so I’m giving it a shout out on the dev list in the spirit of increasing the flow of information here.
Summary of problem: Anticompaction excludes sstables that have been, or are, compacting. Anticompactions can also fail on a single machine due to any number of reasons. In either of these scenarios, a potentially large amount of data will be marked as unrepaired on one machine that’s marked as repaired on the others. During the next incremental repair, this potentially large amount of data will be unnecessarily streamed out to the other nodes, because it won’t be in their unrepaired data. Proposed solution: Add a ‘pending repair’ bucket to the existing repaired and unrepaired sstable buckets. We do the anticompaction up front, but put the anticompacted data into the pending bucket. From here, the repair proceeds normally against the pending sstables, with the streamed sstables also going into the pending buckets. Once all nodes have completed streaming, the pending sstables are moved into the repaired bucket, or back into unrepaired if there’s a failure. - Blake