[
https://issues.apache.org/jira/browse/CASSANDRA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14900984#comment-14900984
]
Aleksey Yeschenko edited comment on CASSANDRA-10342 at 9/21/15 5:05 PM:
------------------------------------------------------------------------
See [this Sylvain's
comment|https://issues.apache.org/jira/browse/CASSANDRA-7085?focusedCommentId=14593427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14593427].
was (Author: iamaleksey):
See [this Sylvain's
comment|https://issues.apache.org/jira/browse/CASSANDRA-7085?focusedCommentId=14594456&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14594456].
> Read defragmentation can cause unnecessary repairs
> --------------------------------------------------
>
> Key: CASSANDRA-10342
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10342
> Project: Cassandra
> Issue Type: Bug
> Reporter: Marcus Olsson
> Assignee: Marcus Eriksson
> Priority: Minor
>
> After applying the fix from CASSANDRA-10299 to the cluster we started having
> a problem of ~20k small sstables appearing for the table with static data
> when running incremental repair.
> In the logs there were several messages about flushes for that table, one for
> each repaired range. The flushed sstables were 0.000kb in size with < 100 ops
> in each. When checking cfstats there were several writes to that table, even
> though we were only reading from it and read repair did not repair anything.
> After digging around in the codebase I noticed that defragmentation of data
> can occur while reading, depending on the query and some other conditions.
> This causes the read data to be inserted again to have it in a more recent
> sstable, which can be a problem if that data was repaired using incremental
> repair. The defragmentation is done in
> [CollationController.java|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/CollationController.java#L151].
> I guess this wasn't a problem with full repairs since I assume that the
> digest should be the same even if you have two copies of the same data. But
> with incremental repair this will most probably cause a mismatch between
> nodes if that data already was repaired, since the other nodes probably won't
> have that data in their unrepaired set.
> ------
> I can add that the problems on our cluster was probably due to the fact that
> CASSANDRA-10299 caused the same data to be streamed multiple times and ending
> up in several sstables. One of the conditions for the defragmentation is that
> the number of sstables read during a read request have to be more than the
> minimum number of sstables needed for a compaction(> 4 in our case). So
> normally I don't think this would cause ~20k sstables to appear, we probably
> hit an extreme.
> One workaround for this is to use another compaction strategy than STCS(it
> seems to be the only affected strategy, atleast in 2.1), but the solution
> might be to either make defragmentation configurable per table or avoid
> reinserting the data if any of the sstables involved in the read are repaired.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)