Re: running repairs on insert-only tables
Jeff, good to hear from you. Based on what you're saying, we can avoid regular repairs on these tables. We can live with the read repairs because the bulk of these tables are being used strictly for analytics queries by Spark jobs or for ad-hoc queries by members of the technical team. They're not part of the application read path. Thanks. Not having to do these repairs on a regular basis is a big win for us. On Thu, Nov 5, 2020 at 11:33 AM Jeff Jirsa wrote: > > > > On Nov 5, 2020, at 10:18 AM, Mitch Gitman wrote: > > > > Hi! > > > > > Now, we could comfortably run all the repairs we need to within our > off-hours window if we just left out all our tables that are insert-only. > By insert-only, I mean that we have certain classes of tables that we're > only inserting into; we're never updating them or deleting them. Therefore, > these are tables that have no tombstones, and if repairs are just about > clearing out tombstones, then ostensibly they shouldn't need to be > repaired. The question is, is that really the case? Is there any reason to > still run repairs on insert-only tables? > > > > If I come up with my own answer I'm satisfied with, I'll reply to myself > here. > > A table that never does deletes does indeed have different repair > requirements. > > You strictly don’t need to repair it EXCEPT to guarantee consistency when > replacing a host. If you do have a host fail, then strictly speaking you > should repair all of the replicas of the down host before you stream in the > replacement host, but that’s likely rare and this is true for all workloads > and almost nobody does it today but that’s the only real repair requirement > for a table that doesn’t have deletes. > > That said: repair does help reduce differences which may reduce read > repairs, but you’re relying on consistency level for time between insert > and repair ANYWAY so it’s probably fine. > > > > - > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >
Re: running repairs on insert-only tables
> On Nov 5, 2020, at 10:18 AM, Mitch Gitman wrote: > Hi! > > Now, we could comfortably run all the repairs we need to within our off-hours > window if we just left out all our tables that are insert-only. By > insert-only, I mean that we have certain classes of tables that we're only > inserting into; we're never updating them or deleting them. Therefore, these > are tables that have no tombstones, and if repairs are just about clearing > out tombstones, then ostensibly they shouldn't need to be repaired. The > question is, is that really the case? Is there any reason to still run > repairs on insert-only tables? > > If I come up with my own answer I'm satisfied with, I'll reply to myself here. A table that never does deletes does indeed have different repair requirements. You strictly don’t need to repair it EXCEPT to guarantee consistency when replacing a host. If you do have a host fail, then strictly speaking you should repair all of the replicas of the down host before you stream in the replacement host, but that’s likely rare and this is true for all workloads and almost nobody does it today but that’s the only real repair requirement for a table that doesn’t have deletes. That said: repair does help reduce differences which may reduce read repairs, but you’re relying on consistency level for time between insert and repair ANYWAY so it’s probably fine. - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
running repairs on insert-only tables
With all the years I've been working with Cassandra, I'm embarrassed that I have to ask this question. We have some tables that are taking longer to repair than we're comfortable with. We're on Cassandra 3.11, so we have to run full repairs as opposed to incremental repairs, which to my understanding can't be counted on until Cassandra 4.0. We're running sequential repairs, as opposed to the default parallel repairs, so that the repairs can run in a low-intensity fashion while the keyspace is still able to take write and read requests, albeit preferably and primarily during off-hours. The problem with sequential repairs is they're taking too long for us and extending into "on-hours." We could run parallel repairs to speed things up, but that would require suspending the services in our write pipeline, which we'd rather not resort to. Now, we could comfortably run all the repairs we need to within our off-hours window if we just left out all our tables that are insert-only. By insert-only, I mean that we have certain classes of tables that we're only inserting into; we're never updating them or deleting them. Therefore, these are tables that have no tombstones, and if repairs are just about clearing out tombstones, then ostensibly they shouldn't need to be repaired. The question is, is that really the case? Is there any reason to still run repairs on insert-only tables? If I come up with my own answer I'm satisfied with, I'll reply to myself here.