[jira] [Comment Edited] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context

Benedict (JIRA) Tue, 13 Nov 2018 06:43:39 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685312#comment-16685312
 ]


Benedict edited comment on CASSANDRA-14554 at 11/13/18 2:42 PM:
----------------------------------------------------------------

I think the situation would actually be, essentially, undefined behaviour?  
Since even with this patch we've got a race for modifying the various 
collections between {{abort()}} and {{trackNew()}}. Even if this window is 
narrow, this is probably not acceptable for the fix?

We could perhaps create a synchronised transaction object that wraps the 
{{LifecycleTransaction}} and exposes largely the API you have exposed, but also 
synchronises its prepare/commit/abort phases? This would avoid the possibility 
of corrupting the internal state and producing undefined behaviour.

But we're still at least broken on Windows platforms (unless their FS semantics 
have changed), because we abort the shared transaction before we have closed 
our in-progress file handles, and so we cannot delete all of the sstables. Not 
sure what state that leaves us in, but hopefully it would be recovered on 
restart.

Ultimately, the child transaction approach still feels easier to reason about 
for me.  With the right comments, I don't see why we should worry about people 
misusing it for concurrent scenario - and we could only synchronise the 
internal methods (and only synchronise externally the transfer method, for 
clarity).

On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}. 
It's generic, and very close to {{Tracker}} which is a much more global thing. 
Perhaps {{LifecycleTransactionNewTracker}} or something, to make clear its 
scope? The variables wouldn't need to be renamed, also. I don't see us ever 
wanting to back this by something other than a {{LifecycleTransaction}}?

Probably we should have originally called {{LifecycleTransaction}} 
{{LifecycleTxn}}, or simply {{SSTableTxn}}


was (Author: benedict):
I think the situation would actually be, essentially, undefined behaviour?  
Since even with this patch we've got a race for modifying the various 
collections between {{abort()}} and {{trackNew()}}. Even if this window is 
narrow, this is probably not acceptable for the fix?

We could perhaps create a synchronised transaction object that wraps the 
{{LifecycleTransaction}} and exposes largely the API you have exposed, but also 
synchronises its prepare/commit/abort phases? This would avoid the possibility 
of corrupting the internal state and producing undefined behaviour.

But we're still at least broken on Windows platforms (unless their FS semantics 
have changed), because we abort the shared transaction before we have closed 
our in-progress file handles, and so we cannot delete all of the sstables. Not 
sure what state that leaves us in, but hopefully it would be recovered on 
restart.

On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}. 
It's generic, and very close to {{Tracker}} which is a much more global thing. 
Perhaps {{LifecycleTransactionNewTracker}} or something, to make clear its 
scope? The variables wouldn't need to be renamed, also. I don't see us ever 
wanting to back this by something other than a {{LifecycleTransaction}}?

Probably we should have originally called {{LifecycleTransaction}} 
{{LifecycleTxn}}, or simply {{SSTableTxn}}

> LifecycleTransaction encounters ConcurrentModificationException when used in 
> multi-threaded context
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dinesh Joshi
>            Assignee: Dinesh Joshi
>            Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter 
> this exception -
> {quote}java.util.ConcurrentModificationException: null
>  at 
> java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at 
> org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at 
> org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and 
> share it between threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}} 
> which is an {{IncomingStreamMessage}}. This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context 
> and this leads to streaming failures due to object sharing. On trunk, this 
> object is shared across all threads that transfer sstables in parallel for 
> the given {{TableId}} in a {{StreamSession}}. There are two options to solve 
> this - make {{LifecycleTransaction}} and the associated objects thread safe, 
> scope the transaction to a single {{CassandraIncomingFile}}. The consequences 
> of the latter option is that if we experience streaming failure we may have 
> redundant SSTables on disk. This is ok as compaction should clean this up. A 
> third option is we synchronize access in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context

Reply via email to