[
https://issues.apache.org/jira/browse/CASSANDRA-12730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15648065#comment-15648065
]
Benjamin Roth commented on CASSANDRA-12730:
-------------------------------------------
Maybe this is offtopic to this issue but for my understanding it sounds like
doing incremental repairs with MVs always produced crap. In order to guarantee
a consistent "repairedAt" state, you probably need sth like a sandboxed write
path that is separated from the regular write path to be sure that streaming
mutations and regular mutations are completely separated. and when streaming
finished, you can flush all tables on all nodes and flag the newly created
SSTables as repaired. But that again sounds like a very complex change.
If the write path stays local, say you have only views with the same partition
key, that process could be simplified a bit e.g. by streaming that SSTable
directly to disk (like having no view) and then building a view from that
single SSTable. But that required an "offline" view creation not going through
the regular write path. Then you could also flag the view SSTable as repaired.
Maybe that sounds easier than it actually is and maybe I missed sth - I am
quite new to the CS codebase. Just added my thoughts.
> Thousands of empty SSTables created during repair - TMOF death
> --------------------------------------------------------------
>
> Key: CASSANDRA-12730
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12730
> Project: Cassandra
> Issue Type: Bug
> Components: Local Write-Read Paths
> Reporter: Benjamin Roth
> Priority: Critical
>
> Last night I ran a repair on a keyspace with 7 tables and 4 MVs each
> containing a few hundret million records. After a few hours a node died
> because of "too many open files".
> Normally one would just raise the limit, but: We already set this to 100k.
> The problem was that the repair created roughly over 100k SSTables for a
> certain MV. The strange thing is that these SSTables had almost no data (like
> 53bytes, 90bytes, ...). Some of them (<5%) had a few 100 KB, very few (<1%
> had normal sizes like >= few MB). I could understand, that SSTables queue up
> as they are flushed and not compacted in time but then they should have at
> least a few MB (depending on config and avail mem), right?
> Of course then the node runs out of FDs and I guess it is not a good idea to
> raise the limit even higher as I expect that this would just create even more
> empty SSTables before dying at last.
> Only 1 CF (MV) was affected. All other CFs (also MVs) behave sanely. Empty
> SSTables have been created equally over time. 100-150 every minute. Among the
> empty SSTables there are also Tables that look normal like having few MBs.
> I didn't see any errors or exceptions in the logs until TMOF occured. Just
> tons of streams due to the repair (which I actually run over cs-reaper as
> subrange, full repairs).
> After having restarted that node (and no more repair running), the number of
> SSTables went down again as they are compacted away slowly.
> According to [~zznate] this issue may relate to CASSANDRA-10342 +
> CASSANDRA-8641
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)