[
https://issues.apache.org/jira/browse/CASSANDRA-8571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis resolved CASSANDRA-8571.
---------------------------------------
Resolution: Duplicate
Fix Version/s: (was: 2.1.3)
> Free space management does not work very well
> ---------------------------------------------
>
> Key: CASSANDRA-8571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8571
> Project: Cassandra
> Issue Type: Bug
> Reporter: Bartłomiej Romański
>
> Hi all,
> We've got a cluster of 2.1.2 with 18 nodes equipped with 3x 480GB SSD each
> (JBODs). We mostly use LCS.
> Recently, our nodes starts failing with 'no space left on device'. It all
> started with our mistake - we let our LCS accumulate too much in L0.
> As a result, STCS woke up and we end with some big sstables on each node
> (let's say 5-10 sstables, 20-50gb each).
> During normal operation we keep our disks about 50% full. This gives about
> 200 GB free space on each of them. This was too little for compacting all
> accumulated L0 sstables at once. Cassandra kept trying to do that and keep
> failing...
> Evantually, we managed to stabilized the situation (with some crazy code
> hacking, manually moving sstables etc...). However, there are a few things
> that would be more than helpful in recovering from such situations more
> automatically...
> First, please look at DiskAwareRunnable.runMayThrow(). This methods initiates
> (local) variable: writeSize. I believe we should check somewhere here if we
> have enough space on a chosen disk. The problem is that writeSize is never
> read... Am I missing something here?
> Btw, while in STCS we first look for the least overloaded disk, and then (if
> there are more than one such disks) for the one with the most free space
> (please note the sort order in Directories.getWriteableLocation()). That's
> often suboptimal (it's usually better to wait for the bigger disk than to
> compact fewer sstables now), but probably not crucial.
> Second, the strategy (used by LCS) that we first choose target disk and then
> use it for whole compaction is not the best one. For big compactions (eg.
> after some massive operations like bootstrap or repair; or after some issues
> with LCS like in our case) on small drives (eg. JBOD of SSDs) these will
> never succeed. Much better strategy would be to choose target drive for each
> output sstable separately, or at least round robin them.
> Third, it would be helpful if the default check for MAX_COMPACTING_L0 in
> LeveledManifest.getCandidatesFor() would be expanded to support also limit
> for total space. After fallback STCS in L0 you end up with very big sstables
> and 32 of them is just too much for one compaction on a small drives.
> We finally used some hack similar the last option (as it was the easiest one
> to implement in a hurry), but any improvents described above would save us
> from all this.
> Thanks,
> BR
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)