Re: Switching to Incremental Repair

2024-02-15 Thread Chris Lohfink
I would recommend adding something to C* to be able to flip the repaired state on all sstables quickly (with default OSS can turn nodes off one at a time and use sstablerepairedset). It's a life saver to be able to revert back to non-IR if migration going south. Same can be used to quickly switch

Re: Switching to Incremental Repair

2024-02-15 Thread Bowen Song via user
The gc_grace_seconds, which default to 10 days, is the maximal safe interval between repairs. How much data gets written during that period of time? Will your nodes run out of disk space because of the new data written during that time? If so, it sounds like your nodes are dangerously close to

Re: Switching to Incremental Repair

2024-02-15 Thread Kristijonas Zalys
Hi folks, One last question regarding incremental repair. What would be a safe approach to temporarily stop running incremental repair on a cluster (e.g.: during a Cassandra major version upgrade)? My understanding is that if we simply stop running incremental repair, the cluster's nodes can, in

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
The over-streaming is only problematic for the repaired SSTables, but it can be triggered by inconsistencies within the unrepaired SSTables during an incremental repair session. This is because although an incremental repair will only compare the unrepaired SSTables, but it will stream both

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
Thank you very much for your explanation. Streaming happens on the token range level, not the SSTable level, right? So, when running an incremental repair before the full repair, the problem that “some unrepaired SSTables are being marked as repaired on one node but not on another” should not

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Unfortunately repair doesn't compare each partition individually. Instead, it groups multiple partitions together and calculate a hash of them, stores the hash in a leaf of a merkle tree, and then compares the merkle trees between replicas during a repair session. If any one of the partitions

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> Caution, using the method you described, the amount of data streamed at the > end with the full repair is not the amount of data written between stopping > the first node and the last node, but depends on the table size, the number > of partitions written, their distribution in the ring and

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Caution, using the method you described, the amount of data streamed at the end with the full repair is not the amount of data written between stopping the first node and the last node, but depends on the table size, the number of partitions written, their distribution in the ring and the

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> That's a feature we need to implement in Reaper. I think disallowing the > start of the new incremental repair would be easier to manage than pausing > the full repair that's already running. It's also what I think I'd expect as > a user. > > I'll create an issue to track this. Thank you,

Re: Switching to Incremental Repair

2024-02-07 Thread Sebastian Marsching
> Full repair running for an entire week sounds excessively long. Even if > you've got 1 TB of data per node, 1 week means the repair speed is less than > 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck > of the full repair speed and work on that instead. We store

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Just one more thing. Make sure you run 'nodetool repair -full' instead of just 'nodetool repair'. That's because the command's default was changed in Cassandra 2.x. The default was full repair before that change, but the new default now is incremental repair. On 07/02/2024 10:28, Bowen Song

Re: Switching to Incremental Repair

2024-02-07 Thread Bowen Song via user
Not disabling auto-compaction may result in repaired SSTables getting compacted together with unrepaired SSTables before the repair state is set on them, which leads to mismatch in the repaired data between nodes, and potentially very expensive over-streaming in a future full repair. You

Re: Switching to Incremental Repair

2024-02-06 Thread Kristijonas Zalys
Hi folks, Thank you all for your insight, this has been very helpful. I was going through the migration process here and I’m not entirely sure why disabling autocompaction on the node is required?

Re: Switching to Incremental Repair

2024-02-04 Thread Alexander DEJANOVSKI
Hi Sebastian, That's a feature we need to implement in Reaper. I think disallowing the start of the new incremental repair would be easier to manage than pausing the full repair that's already running. It's also what I think I'd expect as a user. I'll create an issue to track this. Le sam. 3

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
Full repair running for an entire week sounds excessively long. Even if you've got 1 TB of data per node, 1 week means the repair speed is less than 2 MB/s, that's very slow. Perhaps you should focus on finding the bottleneck of the full repair speed and work on that instead. On 03/02/2024

Re: Switching to Incremental Repair

2024-02-03 Thread Sebastian Marsching
Hi, > 2. use an orchestration tool, such as Cassandra Reaper, to take care of that > for you. You will still need monitor and alert to ensure the repairs are run > successfully, but fixing a stuck or failed repair is not very time sensitive, > you can usually leave it till Monday morning if it

Re: Switching to Incremental Repair

2024-02-03 Thread Bowen Song via user
Hi Kristijonas, It is not possible to run two repairs, regardless whether they are incremental or full, for the same token range and on the same table concurrently. You have two options: 1. create a schedule that's don't overlap, e.g. run incremental repair daily except the 1st of each

Re: Switching to Incremental Repair

2024-02-02 Thread manish khandelwal
They(incremental and full repairs) are required to run separately at different times. You need to identify a schedule, for example, running incremental repairs every week for 3 weeks and then run full repair in the 4th week. Regards Manish On Sat, Feb 3, 2024 at 7:29 AM Kristijonas Zalys wrote:

Re: Switching to Incremental Repair

2024-02-02 Thread Kristijonas Zalys
Hi Bowen, Thank you for your help! So given that we would need to run both incremental and full repair for a given cluster, is it safe to have both types of repair running for the same token ranges at the same time? Would it not create a race condition? Thanks, Kristijonas On Fri, Feb 2, 2024

Re: Switching to Incremental Repair

2024-02-02 Thread Bowen Song via user
Hi Kristijonas, To answer your questions: 1. It's still necessary to run full repair on a cluster on which incremental repair is run periodically. The frequency of full repair is more of an art than science. Generally speaking, the less reliable the storage media, the more frequently full