[
https://issues.apache.org/jira/browse/CASSANDRA-3306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483951#comment-13483951
]
Sylvain Lebresne commented on CASSANDRA-3306:
---------------------------------------------
That code is a mess so let me give a shot at describing what happens for the
record. Say node1 wants to stream files A, B and C to node2. If everything goes
well what happens is:
# node1 sends the first file A with a StreamHeader that says that A, B and C
are pending files and A is the currently sent file. On node2, a new
StreamInSession is created with those information.
# Once A is finished, node2 remove A from the pending file in the
StreamInSession send an acknowledgement to node1, and then node1 sends B with a
StreamHeader with no pending files (basically the list of pending files is only
sent the first time so that the StreamInSession on node2 knows when everything
is finished) and B as current file. When node2 received that StreamHeader, it
retrieve the StreamInSession, setting B as the current files.
# Once B is finished, node2 removes it from pending files, acks to node1 and
node1 sends C with a StreamHeader with no pending file and C as current file.
Node2 retrieven the StreamInSession and modify it accordingly.
# At last, once C is finished, node2 removes it from the pending files. Then it
realizes the pending files are empty and so that the streaming is finished and
at that point it adds all the SSTableReader created so far to the cfs (and acks
to node1 the end of the streaming).
Now, the problem is if say node1 is marked dead by mistake by node2 during say
the streaming of A. I that happens, the only thing we do on node2 is to close
the session and remove the streamInSession from the global sessions map.
However we don't shutdown the stream or anything, so if node1 is in fact still
alive, what will happen is:
# A will finish his transfer correctly. Once that's done, node2 will still send
an acknowledgement (probably the first mistake, we could check that the session
has been closed and send an error instead).
# Node1 getting it's acknowledgement will send B with a StreamHeader that has B
as current file and no pending files as usual. On reception, node2 will not
find any StreamInSession (it has been removed during the close), and so it will
create a new one as if that was the beginning of a transfer. And that session
will have no pending file (second mistake: if we have to create a new
StreamInSession but there is no pending file at all something wrong has
happened).
# Once B is fully streamed, node2 will acknowledge it to node1 and remove it
from it's streamInSession. But that session the new one we just created with no
pending file. So the streamInSession will consider the streaming is finished,
and it will thus add the SSTableReader for B to the cfs.
# Because B has been acknowledged, node1 will start sending C (again, with no
pending file in the StreamHeader). This will be done as soon as B was finished,
and so concurrently with the streamInSession on node2 closing itself.
# So when node2 receives the StreamHeader with C, it will try to retrieve the
session and will find the previous session. And will happily add C as the
current file for that session (third and fourth mistake: StreamInSession should
not add a file as current unless it is a pending file for this session, and a
session could detect that it's being reused even though it has just detected
itself as finished).
# Now when C transfer finishes, the seesion will be notify and since it still
has no pending files, it will once again consider the streaming as complete.
But since it's still the same session, it still has the SSTableReader for B in
its list of created reader (as well as the one for C now). And that's when it
adds B for a second time to the DataTracker.
I also not that we end up without having ever add the SSTableReader for A to
the cfs since the very first StreamInSession was never finished. This is not a
big deal in that the stream itself has been indicated as failed to the client
anyway, but just to say that it's not just a problem of duplicating a
SSTableReader preference.
Anyway, let me back on what I said earlier. We should definitively fix some if
not all of the "mistake" above (and send a SESSION_FAILURE to node1 as soon as
we detect something is wrong).
But that being said, my comment on the comment in DataTracker being obsolete
still stand, and replacing the list by a set in there would have at least the
advantage of slightly simplifying the code of DataTracker.View.newSSTables(),
as well as being more resilient if a SSTableReader is added twice. Not a big
deal though.
> Error in LeveledCompactionStrategy
> ----------------------------------
>
> Key: CASSANDRA-3306
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3306
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 1.0.0
> Reporter: Radim Kolar
> Assignee: Yuki Morishita
> Attachments: 0001-CASSANDRA-3306-test.patch
>
>
> during stress testing, i always get this error making leveledcompaction
> strategy unusable. Should be easy to reproduce - just write fast.
> ERROR [CompactionExecutor:6] 2011-10-04 15:48:52,179
> AbstractCassandraDaemon.java (line 133) Fatal exception in thread
> Thread[CompactionExecutor:6,5,main]
> java.lang.AssertionError
> at
> org.apache.cassandra.db.DataTracker$View.newSSTables(DataTracker.java:580)
> at
> org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:546)
> at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:268)
> at
> org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:232)
> at
> org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:960)
> at
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:199)
> at
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:47)
> at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:131)
> at
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> and this is in json data for table:
> {
> "generations" : [ {
> "generation" : 0,
> "members" : [ 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470,
> 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484 ]
> }, {
> "generation" : 1,
> "members" : [ ]
> }, {
> "generation" : 2,
> "members" : [ ]
> }, {
> "generation" : 3,
> "members" : [ ]
> }, {
> "generation" : 4,
> "members" : [ ]
> }, {
> "generation" : 5,
> "members" : [ ]
> }, {
> "generation" : 6,
> "members" : [ ]
> }, {
> "generation" : 7,
> "members" : [ ]
> } ]
> }
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira