[
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16037380#comment-16037380
]
Jonathan Owens commented on CASSANDRA-13418:
--------------------------------------------
We're chasing what may be a gotcha in our implementation of this. We have one
cluster that does regular incremental repairs, and is ending up with a whole
lot of duplicated data across sstables, we guess due to overstreaming.
Explicitly ignoring overlap is awesome for compacting away tombstones, but does
nothing to detect duplicate partitions across tables on disk. And in TWCS,
because it uses largest-timestamp to bucket, tables with older data in them
that was streamed later will never appear in the same compaction operation as
the table they "should have" been written in the first time. CASSANDRA-10496
would resolve this eventually by pushing that older data into the correct
bucket, but we need a workaround sooner.
We're contemplating a few options:
* I remember, or imagined, a ticket to try to suss out overlapping sstables and
include them in the current compaction operation if found, rather than
cancelling the operation. That seems good here, because in TWCS you should not
have many overlaps, and if you do they need to be addressed somehow or you end
up with duplicates.
* We could switch to cassandra-reaper or something similar and do
higher-precision repairs to reduce overstreaming, though that's a lot of work
to fix what seems really like a compaction artifact.
* Reverting the change would put us back in the world where tombstones don't
expire due to overlap checks failing, so that's out.
* We can write an external tool to detect overlaps and issue user-defined
compactions against them, but that seems really yucky.
* We could never run incremental repairs and rely only on higher consistency
levels on write/read, and let read repair do the work. This fixes the problem
only by decreasing the magnitude.
I still believe this patch is a good idea, as optimizing for tombstone expiry
is essential with TWCS, but the repair interaction here is worth pointing out.
> Allow TWCS to ignore overlaps when dropping fully expired sstables
> ------------------------------------------------------------------
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction
> Reporter: Corentin Chary
> Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If
> you really want read-repairs you're going to have sstables blocking the
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a
> very low value and that will purge the blockers of old data that should
> already have expired, thus removing the overlaps and allowing the other
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have
> time series, you might not care if all your data doesn't exactly expire at
> the right time, or if data re-appears for some time, as long as it gets
> deleted as soon as it can. And in this situation I believe it would be really
> beneficial to allow users to simply ignore overlapping SSTables when looking
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset,
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be
> enough to greatly reduce entropy of the most used data (and if you have
> timeseries, you're likely to have a dashboard doing the same important
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]