[ 
https://issues.apache.org/jira/browse/CASSANDRA-6818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-6818:
-------------------------------------------

    Assignee: Yuki Morishita

> SSTable references not released if stream session fails before it starts
> ------------------------------------------------------------------------
>
>                 Key: CASSANDRA-6818
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6818
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Richard Low
>            Assignee: Yuki Morishita
>
> I observed a large number of 'orphan' SSTables - SSTables that are in the 
> data directory but not loaded by Cassandra - on a 1.1.12 node that had a 
> large stream fail before it started. These orphan files are particularly 
> dangerous because if the node is restarted and picks up these SSTables it 
> could bring data back to life if tombstones have been GCed. To confirm the 
> SSTables are orphan, I created a snapshot and it didn't contain these files. 
> I can see in the logs that they have been compacted so should have been 
> deleted.
> The log entries for the stream are:
> {{INFO [StreamStage:1] 2014-02-21 19:41:48,742 StreamOut.java (line 115) 
> Beginning transfer to /10.0.0.1}}
> {{INFO [StreamStage:1] 2014-02-21 19:41:48,743 StreamOut.java (line 96) 
> Flushing memtables for [CFS(Keyspace='ks', ColumnFamily='cf1'), 
> CFS(Keyspace='ks', ColumnFamily='cf2')]...}}
> {{ERROR [GossipTasks:1] 2014-02-21 19:41:49,239 AbstractStreamSession.java 
> (line 113) Stream failed because /10.0.0.1 died or was restarted/removed 
> (streams may still be active in background, but further streams won't be 
> started)}}
> {{INFO [StreamStage:1] 2014-02-21 19:41:51,783 StreamOut.java (line 161) 
> Stream context metadata [...] 2267 sstables.}}
> {{INFO [StreamStage:1] 2014-02-21 19:41:51,789 StreamOutSession.java (line 
> 182) Streaming to /10.0.0.1}}
> {{INFO [Streaming to /10.0.0.1:1] 2014-02-21 19:42:02,218 FileStreamTask.java 
> (line 99) Found no stream out session at end of file stream task - this is 
> expected if the receiver went down}}
> After digging in the code, here's what I think the issue is:
> 1. StreamOutSession.transferRanges() creates a streaming session, which is 
> registered with the failure detector in AbstractStreamSession's constructor.
> 2. Memtables are flushed, potentially taking a long time.
> 3. The remote node fails, convict() is called and the StreamOutSession is 
> closed. However, at this time StreamOutSession.files is empty because it's 
> still waiting for the memtables to flush.
> 4. Memtables finish flusing, references are obtained to SSTables to be 
> streamed and the PendingFiles are added to StreamOutSession.files.
> 5. The first stream fails but the StreamOutSession isn't found so is never 
> closed and the references are never released.
> This code is more or less the same on 1.2 so I would expect it to reproduce 
> there. I looked at 2.0 and can't even see where SSTable references are 
> released when the stream fails.
> Some possible fixes for 1.1/1.2:
> 1. Don't register with the failure detector until after the PendingFiles are 
> set up. I think this is the behaviour in 2.0 but I don't know if it was done 
> like this to avoid this issue.
> 2. Detect the above case in (e.g.) StreamOutSession.begin() by noticing the 
> session has been closed with care to avoid double frees.
> 3. Add some synchronization so closeInternal() doesn't race with setting up 
> the session.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to