[ 
https://issues.apache.org/jira/browse/CASSANDRA-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483951#comment-13483951
 ] 

Sylvain Lebresne commented on CASSANDRA-3306:
---------------------------------------------

That code is a mess so let me give a shot at describing what happens for the 
record. Say node1 wants to stream files A, B and C to node2. If everything goes 
well what happens is:
# node1 sends the first file A with a StreamHeader that says that A, B and C 
are pending files and A is the currently sent file. On node2, a new 
StreamInSession is created with those information.
# Once A is finished, node2 remove A from the pending file in the 
StreamInSession send an acknowledgement to node1, and then node1 sends B with a 
StreamHeader with no pending files (basically the list of pending files is only 
sent the first time so that the StreamInSession on node2 knows when everything 
is finished) and B as current file. When node2 received that StreamHeader, it 
retrieve the StreamInSession, setting B as the current files.
# Once B is finished, node2 removes it from pending files, acks to node1 and 
node1 sends C with a StreamHeader with no pending file and C as current file.  
Node2 retrieven the StreamInSession and modify it accordingly.
# At last, once C is finished, node2 removes it from the pending files. Then it 
realizes the pending files are empty and so that the streaming is finished and 
at that point it adds all the SSTableReader created so far to the cfs (and acks 
to node1 the end of the streaming).

Now, the problem is if say node1 is marked dead by mistake by node2 during say 
the streaming of A. I that happens, the only thing we do on node2 is to close 
the session and remove the streamInSession from the global sessions map.  
However we don't shutdown the stream or anything, so if node1 is in fact still 
alive, what will happen is:
# A will finish his transfer correctly. Once that's done, node2 will still send 
an acknowledgement (probably the first mistake, we could check that the session 
has been closed and send an error instead).
# Node1 getting it's acknowledgement will send B with a StreamHeader that has B 
as current file and no pending files as usual. On reception, node2 will not 
find any StreamInSession (it has been removed during the close), and so it will 
create a new one as if that was the beginning of a transfer. And that session 
will have no pending file (second mistake: if we have to create a new 
StreamInSession but there is no pending file at all something wrong has 
happened).
# Once B is fully streamed, node2 will acknowledge it to node1 and remove it 
from it's streamInSession. But that session the new one we just created with no 
pending file. So the streamInSession will consider the streaming is finished, 
and it will thus add the SSTableReader for B to the cfs.
# Because B has been acknowledged, node1 will start sending C (again, with no 
pending file in the StreamHeader). This will be done as soon as B was finished, 
and so concurrently with the streamInSession on node2 closing itself.
# So when node2 receives the StreamHeader with C, it will try to retrieve the 
session and will find the previous session. And will happily add C as the 
current file for that session (third and fourth mistake: StreamInSession should 
not add a file as current unless it is a pending file for this session, and a 
session could detect that it's being reused even though it has just detected 
itself as finished).
# Now when C transfer finishes, the seesion will be notify and since it still 
has no pending files, it will once again consider the streaming as complete.  
But since it's still the same session, it still has the SSTableReader for B in 
its list of created reader (as well as the one for C now). And that's when it 
adds B for a second time to the DataTracker.

I also not that we end up without having ever add the SSTableReader for A to 
the cfs since the very first StreamInSession was never finished. This is not a 
big deal in that the stream itself has been indicated as failed to the client 
anyway, but just to say that it's not just a problem of duplicating a 
SSTableReader preference.

Anyway, let me back on what I said earlier. We should definitively fix some if 
not all of the "mistake" above (and send a SESSION_FAILURE to node1 as soon as 
we detect something is wrong).

But that being said, my comment on the comment in DataTracker being obsolete 
still stand, and replacing the list by a set in there would have at least the 
advantage of slightly simplifying the code of DataTracker.View.newSSTables(), 
as well as being more resilient if a SSTableReader is added twice. Not a big 
deal though.

                
> Error in LeveledCompactionStrategy
> ----------------------------------
>
>                 Key: CASSANDRA-3306
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3306
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Radim Kolar
>            Assignee: Yuki Morishita
>         Attachments: 0001-CASSANDRA-3306-test.patch
>
>
> during stress testing, i always get this error making leveledcompaction 
> strategy unusable. Should be easy to reproduce - just write fast.
> ERROR [CompactionExecutor:6] 2011-10-04 15:48:52,179 
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread 
> Thread[CompactionExecutor:6,5,main]
> java.lang.AssertionError
>       at 
> org.apache.cassandra.db.DataTracker$View.newSSTables(DataTracker.java:580)
>       at 
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:546)
>       at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:268)
>       at 
> org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:232)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:960)
>       at 
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:199)
>       at 
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:47)
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:131)
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
>       at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>       at java.lang.Thread.run(Thread.java:662)
> and this is in json data for table:
> {
>   "generations" : [ {
>     "generation" : 0,
>     "members" : [ 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 
> 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484 ]
>   }, {
>     "generation" : 1,
>     "members" : [ ]
>   }, {
>     "generation" : 2,
>     "members" : [ ]
>   }, {
>     "generation" : 3,
>     "members" : [ ]
>   }, {
>     "generation" : 4,
>     "members" : [ ]
>   }, {
>     "generation" : 5,
>     "members" : [ ]
>   }, {
>     "generation" : 6,
>     "members" : [ ]
>   }, {
>     "generation" : 7,
>     "members" : [ ]
>   } ]
> }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to