Re: Migrating to incremental repair in C* 4.x

Bowen Song via user Mon, 27 Nov 2023 10:27:45 -0800

Hi Jeff,

Does subrange repair mark the SSTable as repaired? From my memory, itdoesn't.



Regards,
Bowen


On 27/11/2023 16:47, Jeff Jirsa wrote:

I don’t work for datastax, thats not my blog, and I’m on a phone andpotentially missing nuance, but I’d never try to convert a cluster toIR by disabling auto compaction . It sounds very much out of date orits optimized for fixing one node in a cluster somehow. It didn’t makesense in the 4.0 era.
Instead I’d leave compaction running and slowly run incremental repairacross parts of the token range, slowing down as pending compactionsincrease
I’d choose token ranges such that you’d repair 5-10% of the data oneach node at a time
On Nov 23, 2023, at 11:31 PM, Sebastian Marsching<sebast...@marsching.com> wrote:
 Hi,
we are currently in the process of migrating from C* 3.11 to C* 4.1and we want to start using incremental repairs after the upgrade hasbeen completed. It seems like all the really bad bugs that made usingincremental repairs dangerous in C* 3.x have been fixed in 4.x, andfor our specific workload, incremental repairs should offer asignificant performance improvement.
Therefore, I am currently devising a plan how we could migrate tousing incremental repairs. I am aware of the guide from DataStax(https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/operations/opsRepairNodesMigration.html),but this guide is quite old and was written with C* 3.0 in mind, so Iam not sure whether this still fully applies to C* 4.x.
In addition to that, I am not sure whether this approach fits ourworkload. In particular, I am wary about disabling autocompaction foran extended period of time (if you are interested in the reasons why,they are at the end of this e-mail).
Therefore, I am wondering whether a slighly different process mightwork better for us:
1. Run a full repair (we periodically run those anyway).
2. Mark all SSTables as repaired, even though they will include datathat has not been repaired yet because it was added while the repairprocess was running.
3. Run another full repair.
4. Start using incremental repairs (and the occasional full repair inorder to handle bit rot etc.).
If I understood the interactions between full repairs and incrementalrepairs correctly, step 3 should repair potential inconsistencies inthe SSTables that were marked as repaired in step 2 while avoidingthe problem of overstreaming that would happen when only markingthose SSTables as repaired that already existed before step 1.
Does anyone see a flaw in this concept or has experience with asimilar scenario (migrating to incremental repairs in an environmentwith high-density nodes, where a single table contains most of the data)?
I am also interested in hearing about potential problems other C*users experienced when migrating to incremental repairs, so that weget a better idea what to expect.
Thanks,
Sebastian


Here is the explanation why I am being cautious:
More than 95 percent of our data is stored in a single table, and weuse high density nodes (storing about 3 TB of data per node). Thismeans that a full repair for the whole cluster takes about a week.
The reason for this layout is that most of our data is “cold”,meaning that it is written once, never updated, and rarely deleted orread. However, new data is added continuously, so disablingautocompaction for the duration of a full repair would lead to a highnumber of small SSTables accumulating over the course of the week,and I am not sure how well the cluster would handle such a situation(and the increased load when autocompaction is enabled again).

Re: Migrating to incremental repair in C* 4.x

Reply via email to