Re: running repairs on insert-only tables

2020-11-05 Thread Mitch Gitman
Jeff, good to hear from you.


Based on what you're saying, we can avoid regular repairs on these tables.
We can live with the read repairs because the bulk of these tables are
being used strictly for analytics queries by Spark jobs or for ad-hoc
queries by members of the technical team. They're not part of the
application read path.


Thanks. Not having to do these repairs on a regular basis is a big win for
us.

On Thu, Nov 5, 2020 at 11:33 AM Jeff Jirsa  wrote:

>
>
> > On Nov 5, 2020, at 10:18 AM, Mitch Gitman  wrote:
> >
>
> Hi!
>
> >
> > Now, we could comfortably run all the repairs we need to within our
> off-hours window if we just left out all our tables that are insert-only.
> By insert-only, I mean that we have certain classes of tables that we're
> only inserting into; we're never updating them or deleting them. Therefore,
> these are tables that have no tombstones, and if repairs are just about
> clearing out tombstones, then ostensibly they shouldn't need to be
> repaired. The question is, is that really the case? Is there any reason to
> still run repairs on insert-only tables?
> >
> > If I come up with my own answer I'm satisfied with, I'll reply to myself
> here.
>
> A table that never does deletes does indeed have different repair
> requirements.
>
> You strictly don’t need to repair it EXCEPT to guarantee consistency when
> replacing a host. If you do have a host fail, then strictly speaking you
> should repair all of the replicas of the down host before you stream in the
> replacement host, but that’s likely rare and this is true for all workloads
> and almost nobody does it today but that’s the only real repair requirement
> for a table that doesn’t have deletes.
>
> That said: repair does help reduce differences which may reduce read
> repairs, but you’re relying on consistency level for time between insert
> and repair ANYWAY so it’s probably fine.
>
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>


Re: running repairs on insert-only tables

2020-11-05 Thread Jeff Jirsa



> On Nov 5, 2020, at 10:18 AM, Mitch Gitman  wrote:
> 

Hi! 

> 
> Now, we could comfortably run all the repairs we need to within our off-hours 
> window if we just left out all our tables that are insert-only. By 
> insert-only, I mean that we have certain classes of tables that we're only 
> inserting into; we're never updating them or deleting them. Therefore, these 
> are tables that have no tombstones, and if repairs are just about clearing 
> out tombstones, then ostensibly they shouldn't need to be repaired. The 
> question is, is that really the case? Is there any reason to still run 
> repairs on insert-only tables?
> 
> If I come up with my own answer I'm satisfied with, I'll reply to myself here.

A table that never does deletes does indeed have different repair requirements.

You strictly don’t need to repair it EXCEPT to guarantee consistency when 
replacing a host. If you do have a host fail, then strictly speaking you should 
repair all of the replicas of the down host before you stream in the 
replacement host, but that’s likely rare and this is true for all workloads and 
almost nobody does it today but that’s the only real repair requirement for a 
table that doesn’t have deletes. 

That said: repair does help reduce differences which may reduce read repairs, 
but you’re relying on consistency level for time between insert and repair 
ANYWAY so it’s probably fine. 



-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



running repairs on insert-only tables

2020-11-05 Thread Mitch Gitman
With all the years I've been working with Cassandra, I'm embarrassed that I
have to ask this question.

We have some tables that are taking longer to repair than we're comfortable
with. We're on Cassandra 3.11, so we have to run full repairs as opposed to
incremental repairs, which to my understanding can't be counted on until
Cassandra 4.0. We're running sequential repairs, as opposed to the default
parallel repairs, so that the repairs can run in a low-intensity fashion
while the keyspace is still able to take write and read requests, albeit
preferably and primarily during off-hours. The problem with sequential
repairs is they're taking too long for us and extending into "on-hours."

We could run parallel repairs to speed things up, but that would require
suspending the services in our write pipeline, which we'd rather not resort
to.

Now, we could comfortably run all the repairs we need to within our
off-hours window if we just left out all our tables that are insert-only.
By insert-only, I mean that we have certain classes of tables that we're
only inserting into; we're never updating them or deleting them. Therefore,
these are tables that have no tombstones, and if repairs are just about
clearing out tombstones, then ostensibly they shouldn't need to be
repaired. The question is, is that really the case? Is there any reason to
still run repairs on insert-only tables?

If I come up with my own answer I'm satisfied with, I'll reply to myself
here.