> is (2) a direct consequence of a repair on the full token range (and thus
anti-compaction ran only on a subset of the RF nodes)?

Not necessarily, because even with -pr enabled the nodes will be
responsible for different ranges, so they will flush and compact at
different instants. The effect of this on long running repairs is that data
that was marked as repaired in one replica, may be compacted in some other
replica, causing it to not be marked as repaired due to CASSANDRA-9143,
what will cause a mismatch in the next repair. This could probably be
alleviated by CASSANDRA-6696.

2016-10-03 12:16 GMT-03:00 Stefano Ortolani <ostef...@gmail.com>:

> I was wondering: is (2) a direct consequence of a repair on the full
> token range (and thus anti-compaction ran only on a subset of the RF
> nodes)?. If I understand correctly, a repair with -pr should fix this,
> at the cost of all nodes performing the anticompaction phase?
>
> Cheers,
> Stefano
>
> On Tue, Sep 27, 2016 at 4:09 PM, Stefano Ortolani <ostef...@gmail.com>
> wrote:
> > Didn't know about (2), and I actually have a time drift between the
> nodes.
> > Thanks a lot Paulo!
> >
> > Regards,
> > Stefano
> >
> > On Thu, Sep 22, 2016 at 6:36 PM, Paulo Motta <pauloricard...@gmail.com>
> > wrote:
> >>
> >> There are a couple of things that could be happening here:
> >> - There will be time differences between when nodes participating repair
> >> flush, so in write-heavy tables there will always be minor differences
> >> during validation, and those could be accentuated by low resolution
> merkle
> >> trees, which will affect mostly larger tables.
> >> - SSTables compacted during incremental repair will not be marked as
> >> repaired, so nodes with different compaction cadences will have
> different
> >> data in their unrepaired set, what will cause mismatches in the
> subsequent
> >> incremental repairs. CASSANDRA-9143 will hopefully fix that limitation.
> >>
> >> 2016-09-22 7:10 GMT-03:00 Stefano Ortolani <ostef...@gmail.com>:
> >>>
> >>> Hi,
> >>>
> >>> I am seeing something weird while running repairs.
> >>> I am testing 3.0.9 so I am running the repairs manually, node after
> node,
> >>> on a cluster with RF=3. I am using a standard repair command
> (incremental,
> >>> parallel, full range), and I just noticed that the third node detected
> some
> >>> ranges out of sync with one of the nodes that just finished repairing.
> >>>
> >>> Since there was no dropped mutation, that sounds weird to me
> considering
> >>> that the repairs are supposed to operate on the whole range.
> >>>
> >>> Any idea why?
> >>> Maybe I am missing something?
> >>>
> >>> Cheers,
> >>> Stefano
> >>>
> >>
> >
>

Reply via email to