[
https://issues.apache.org/jira/browse/CASSANDRA-10862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300686#comment-15300686
]
Paulo Motta commented on CASSANDRA-10862:
-----------------------------------------
Is this incremental or non-incremental repair? What version? Do you know if
these 4000 sstables were created from streaming alone or from anti-compaction?
CASSANDRA-6851 reduced the amount of sstables generated after anti-compaction,
so maybe that will already help mitigate this.
If the problem is with streaming, we could perhaps add a configurable threshold
and perform STCS on received sstables before adding them to the datatracker
({{OnCompletionRunnable}}). We should probably disable this during bootstrap,
as people generally want their nodes to bootstrap faster. We should also make
sure the sstables are not added to the data-tracker when they are compacted,
but only after all of them are compacted in order to be able to abort/rollback
the transaction if the node fails before that.
> LCS repair: compact tables before making available in L0
> --------------------------------------------------------
>
> Key: CASSANDRA-10862
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10862
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction, Streaming and Messaging
> Reporter: Jeff Ferland
>
> When doing repair on a system with lots of mismatched ranges, the number of
> tables in L0 goes up dramatically, as correspondingly goes the number of
> tables referenced for a query. Latency increases dramatically in tandem.
> Eventually all the copied tables are compacted down in L0, then copied into
> L1 (which may be a very large copy), finally reducing the number of SSTables
> per query into the manageable range.
> It seems to me that the cleanest answer is to compact after streaming, then
> mark tables available rather than marking available when the file itself is
> complete.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)