[ https://issues.apache.org/jira/browse/CASSANDRA-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams reassigned CASSANDRA-6818: ------------------------------------------- Assignee: Yuki Morishita > SSTable references not released if stream session fails before it starts > ------------------------------------------------------------------------ > > Key: CASSANDRA-6818 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6818 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Richard Low > Assignee: Yuki Morishita > > I observed a large number of 'orphan' SSTables - SSTables that are in the > data directory but not loaded by Cassandra - on a 1.1.12 node that had a > large stream fail before it started. These orphan files are particularly > dangerous because if the node is restarted and picks up these SSTables it > could bring data back to life if tombstones have been GCed. To confirm the > SSTables are orphan, I created a snapshot and it didn't contain these files. > I can see in the logs that they have been compacted so should have been > deleted. > The log entries for the stream are: > {{INFO [StreamStage:1] 2014-02-21 19:41:48,742 StreamOut.java (line 115) > Beginning transfer to /10.0.0.1}} > {{INFO [StreamStage:1] 2014-02-21 19:41:48,743 StreamOut.java (line 96) > Flushing memtables for [CFS(Keyspace='ks', ColumnFamily='cf1'), > CFS(Keyspace='ks', ColumnFamily='cf2')]...}} > {{ERROR [GossipTasks:1] 2014-02-21 19:41:49,239 AbstractStreamSession.java > (line 113) Stream failed because /10.0.0.1 died or was restarted/removed > (streams may still be active in background, but further streams won't be > started)}} > {{INFO [StreamStage:1] 2014-02-21 19:41:51,783 StreamOut.java (line 161) > Stream context metadata [...] 2267 sstables.}} > {{INFO [StreamStage:1] 2014-02-21 19:41:51,789 StreamOutSession.java (line > 182) Streaming to /10.0.0.1}} > {{INFO [Streaming to /10.0.0.1:1] 2014-02-21 19:42:02,218 FileStreamTask.java > (line 99) Found no stream out session at end of file stream task - this is > expected if the receiver went down}} > After digging in the code, here's what I think the issue is: > 1. StreamOutSession.transferRanges() creates a streaming session, which is > registered with the failure detector in AbstractStreamSession's constructor. > 2. Memtables are flushed, potentially taking a long time. > 3. The remote node fails, convict() is called and the StreamOutSession is > closed. However, at this time StreamOutSession.files is empty because it's > still waiting for the memtables to flush. > 4. Memtables finish flusing, references are obtained to SSTables to be > streamed and the PendingFiles are added to StreamOutSession.files. > 5. The first stream fails but the StreamOutSession isn't found so is never > closed and the references are never released. > This code is more or less the same on 1.2 so I would expect it to reproduce > there. I looked at 2.0 and can't even see where SSTable references are > released when the stream fails. > Some possible fixes for 1.1/1.2: > 1. Don't register with the failure detector until after the PendingFiles are > set up. I think this is the behaviour in 2.0 but I don't know if it was done > like this to avoid this issue. > 2. Detect the above case in (e.g.) StreamOutSession.begin() by noticing the > session has been closed with care to avoid double frees. > 3. Add some synchronization so closeInternal() doesn't race with setting up > the session. -- This message was sent by Atlassian JIRA (v6.2#6252)