[ 
https://issues.apache.org/jira/browse/CASSANDRA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746981#comment-14746981
 ] 

Marcus Eriksson commented on CASSANDRA-10342:
---------------------------------------------

Patch [here|https://github.com/krummas/cassandra/commits/marcuse/10342] which 
only defragments if we only read from unrepaired sstables.

But I wonder if we should keep doing defragmentation at all in 2.2+ where 
incremental repair is default - we should probably do a few benchmarks if the 
over-repair is worth it. [~mambocab] do you have cycles to run the benchmarks? 
I pushed a branch without the defragmentation 
[here|https://github.com/krummas/cassandra/commits/marcuse/10342-no_defrag] - 
we would need to run a mixed workload with a few incremental repairs thrown in 
and compare read latency and amount of data streamed with standard 2.1

> Read defragmentation can cause unnecessary repairs
> --------------------------------------------------
>
>                 Key: CASSANDRA-10342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10342
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Olsson
>            Priority: Minor
>
> After applying the fix from CASSANDRA-10299 to the cluster we started having 
> a problem of ~20k small sstables appearing for the table with static data 
> when running incremental repair.
> In the logs there were several messages about flushes for that table, one for 
> each repaired range. The flushed sstables were 0.000kb in size with < 100 ops 
> in each. When checking cfstats there were several writes to that table, even 
> though we were only reading from it and read repair did not repair anything.
> After digging around in the codebase I noticed that defragmentation of data 
> can occur while reading, depending on the query and some other conditions. 
> This causes the read data to be inserted again to have it in a more recent 
> sstable, which can be a problem if that data was repaired using incremental 
> repair. The defragmentation is done in 
> [CollationController.java|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/CollationController.java#L151].
> I guess this wasn't a problem with full repairs since I assume that the 
> digest should be the same even if you have two copies of the same data. But 
> with incremental repair this will most probably cause a mismatch between 
> nodes if that data already was repaired, since the other nodes probably won't 
> have that data in their unrepaired set.
> ------
> I can add that the problems on our cluster was probably due to the fact that 
> CASSANDRA-10299 caused the same data to be streamed multiple times and ending 
> up in several sstables. One of the conditions for the defragmentation is that 
> the number of sstables read during a read request have to be more than the 
> minimum number of sstables needed for a compaction(> 4 in our case). So 
> normally I don't think this would cause ~20k sstables to appear, we probably 
> hit an extreme.
> One workaround for this is to use another compaction strategy than STCS(it 
> seems to be the only affected strategy, atleast in 2.1), but the solution 
> might be to either make defragmentation configurable per table or avoid 
> reinserting the data if any of the sstables involved in the read are repaired.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to