[
https://issues.apache.org/jira/browse/CASSANDRA-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Jirsa updated CASSANDRA-11209:
-----------------------------------
Comment: was deleted
(was: Similar to CASSANDRA-10510 as well ?)
> SSTable ancestor leaked reference
> ---------------------------------
>
> Key: CASSANDRA-11209
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11209
> Project: Cassandra
> Issue Type: Bug
> Components: Compaction
> Reporter: Jose Fernandez
> Assignee: Marcus Eriksson
> Attachments: screenshot-1.png, screenshot-2.png
>
>
> We're running a fork of 2.1.13 that adds the TimeWindowCompactionStrategy
> from [~jjirsa]. We've been running 4 clusters without any issues for many
> months until a few weeks ago we started scheduling incremental repairs every
> 24 hours (previously we didn't run any repairs at all).
> Since then we started noticing big discrepancies in the LiveDiskSpaceUsed,
> TotalDiskSpaceUsed, and actual size of files on disk. The numbers are brought
> back in sync by restarting the node. We also noticed that when this bug
> happens there are several ancestors that don't get cleaned up. A restart will
> queue up a lot of compactions that slowly eat away the ancestors.
> I looked at the code and noticed that we only decrease the LiveTotalDiskUsed
> metric in the SSTableDeletingTask. Since we have no errors being logged, I'm
> assuming that for some reason this task is not getting queued up. If I
> understand correctly this only happens when the reference count for the
> SStable reaches 0. So this is leading us to believe that something during
> repairs and/or compactions is causing a reference leak to the ancestor table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)