[
https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685312#comment-16685312
]
Benedict edited comment on CASSANDRA-14554 at 11/13/18 2:42 PM:
----------------------------------------------------------------
I think the situation would actually be, essentially, undefined behaviour?
Since even with this patch we've got a race for modifying the various
collections between {{abort()}} and {{trackNew()}}. Even if this window is
narrow, this is probably not acceptable for the fix?
We could perhaps create a synchronised transaction object that wraps the
{{LifecycleTransaction}} and exposes largely the API you have exposed, but also
synchronises its prepare/commit/abort phases? This would avoid the possibility
of corrupting the internal state and producing undefined behaviour.
But we're still at least broken on Windows platforms (unless their FS semantics
have changed), because we abort the shared transaction before we have closed
our in-progress file handles, and so we cannot delete all of the sstables. Not
sure what state that leaves us in, but hopefully it would be recovered on
restart.
Ultimately, the child transaction approach still feels easier to reason about
for me. With the right comments, I don't see why we should worry about people
misusing it for concurrent scenario - and we could only synchronise the
internal methods (and only synchronise externally the transfer method, for
clarity).
On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}.
It's generic, and very close to {{Tracker}} which is a much more global thing.
Perhaps {{LifecycleTransactionNewTracker}} or something, to make clear its
scope? The variables wouldn't need to be renamed, also. I don't see us ever
wanting to back this by something other than a {{LifecycleTransaction}}?
Probably we should have originally called {{LifecycleTransaction}}
{{LifecycleTxn}}, or simply {{SSTableTxn}}
was (Author: benedict):
I think the situation would actually be, essentially, undefined behaviour?
Since even with this patch we've got a race for modifying the various
collections between {{abort()}} and {{trackNew()}}. Even if this window is
narrow, this is probably not acceptable for the fix?
We could perhaps create a synchronised transaction object that wraps the
{{LifecycleTransaction}} and exposes largely the API you have exposed, but also
synchronises its prepare/commit/abort phases? This would avoid the possibility
of corrupting the internal state and producing undefined behaviour.
But we're still at least broken on Windows platforms (unless their FS semantics
have changed), because we abort the shared transaction before we have closed
our in-progress file handles, and so we cannot delete all of the sstables. Not
sure what state that leaves us in, but hopefully it would be recovered on
restart.
On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}.
It's generic, and very close to {{Tracker}} which is a much more global thing.
Perhaps {{LifecycleTransactionNewTracker}} or something, to make clear its
scope? The variables wouldn't need to be renamed, also. I don't see us ever
wanting to back this by something other than a {{LifecycleTransaction}}?
Probably we should have originally called {{LifecycleTransaction}}
{{LifecycleTxn}}, or simply {{SSTableTxn}}
> LifecycleTransaction encounters ConcurrentModificationException when used in
> multi-threaded context
> ---------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-14554
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
> Project: Cassandra
> Issue Type: Bug
> Reporter: Dinesh Joshi
> Assignee: Dinesh Joshi
> Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter
> this exception -
> {quote}java.util.ConcurrentModificationException: null
> at
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
> at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
> at java.lang.Iterable.forEach(Iterable.java:74)
> at
> org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
> at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
> at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
> at
> org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
> at
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and
> share it between threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}}
> which is an {{IncomingStreamMessage}}. This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context
> and this leads to streaming failures due to object sharing. On trunk, this
> object is shared across all threads that transfer sstables in parallel for
> the given {{TableId}} in a {{StreamSession}}. There are two options to solve
> this - make {{LifecycleTransaction}} and the associated objects thread safe,
> scope the transaction to a single {{CassandraIncomingFile}}. The consequences
> of the latter option is that if we experience streaming failure we may have
> redundant SSTables on disk. This is ok as compaction should clean this up. A
> third option is we synchronize access in the streaming infrastructure.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]