[ 
https://issues.apache.org/jira/browse/CASSANDRA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992158#comment-14992158
 ] 

Jim Witschey commented on CASSANDRA-10342:
------------------------------------------

[~krummas] Thanks for the ping, and sorry for letting this fall on the floor. 
I've put it on my list of things to look at post-3.0 release.

> Read defragmentation can cause unnecessary repairs
> --------------------------------------------------
>
>                 Key: CASSANDRA-10342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10342
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Olsson
>            Assignee: Marcus Eriksson
>            Priority: Minor
>
> After applying the fix from CASSANDRA-10299 to the cluster we started having 
> a problem of ~20k small sstables appearing for the table with static data 
> when running incremental repair.
> In the logs there were several messages about flushes for that table, one for 
> each repaired range. The flushed sstables were 0.000kb in size with < 100 ops 
> in each. When checking cfstats there were several writes to that table, even 
> though we were only reading from it and read repair did not repair anything.
> After digging around in the codebase I noticed that defragmentation of data 
> can occur while reading, depending on the query and some other conditions. 
> This causes the read data to be inserted again to have it in a more recent 
> sstable, which can be a problem if that data was repaired using incremental 
> repair. The defragmentation is done in 
> [CollationController.java|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/CollationController.java#L151].
> I guess this wasn't a problem with full repairs since I assume that the 
> digest should be the same even if you have two copies of the same data. But 
> with incremental repair this will most probably cause a mismatch between 
> nodes if that data already was repaired, since the other nodes probably won't 
> have that data in their unrepaired set.
> ------
> I can add that the problems on our cluster was probably due to the fact that 
> CASSANDRA-10299 caused the same data to be streamed multiple times and ending 
> up in several sstables. One of the conditions for the defragmentation is that 
> the number of sstables read during a read request have to be more than the 
> minimum number of sstables needed for a compaction(> 4 in our case). So 
> normally I don't think this would cause ~20k sstables to appear, we probably 
> hit an extreme.
> One workaround for this is to use another compaction strategy than STCS(it 
> seems to be the only affected strategy, atleast in 2.1), but the solution 
> might be to either make defragmentation configurable per table or avoid 
> reinserting the data if any of the sstables involved in the read are repaired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to