[
https://issues.apache.org/jira/browse/CASSANDRA-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063143#comment-15063143
]
Paulo Motta commented on CASSANDRA-10797:
-----------------------------------------
Fix on 3.0 was much smoother and straightforward with the new transactional
system. The only change was to finish {{SSTableWriters}} as soon as they're
received and update the transactions with the corresponding {{SSTableReaders}}.
Thanks for the tip!
I also added tests to verify transactions with mixed {{SStableWriters}} and
{{SStableReaders}} are cleaned up correctly if transaction is aborted. Since
there is no longer rename when the sstable transitions from {{SStableWriter}}
to {{SStableReader}}, uncommitted/unaborted transactions due to node failures
in the middle of a transaction will have its sstables cleaned up as usual on
next startup by {{LifecycleTransaction.removeUnfinishedLeftovers(metadata)}} on
{{ColumnFamilyStore.scrubDataDirectories()}} (this is thoroughly tested on
{{LogTransactionTest}}).
Below are branch and tests (not yet finished):
||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10797]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10797]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10797-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10797-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10797-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10797-dtest/lastCompletedBuild/testReport/]|
> Bootstrap new node fails with OOM when streaming nodes contains thousands of
> sstables
> -------------------------------------------------------------------------------------
>
> Key: CASSANDRA-10797
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10797
> Project: Cassandra
> Issue Type: Bug
> Components: Streaming and Messaging
> Environment: Cassandra 2.1.8.621 w/G1GC
> Reporter: Jose Martinez Poblete
> Assignee: Paulo Motta
> Fix For: 3.0.x, 3.x
>
> Attachments: 10797-nonpatched.png, 10797-patched.png,
> 10798-nonpatched-500M.png, 10798-patched-500M.png, 112415_system.log,
> Heapdump_OOM.zip, Screen Shot 2015-12-01 at 7.34.40 PM.png, dtest.tar.gz
>
>
> When adding a new node to an existing DC, it runs OOM after 25-45 minutes
> Upon heapdump revision, it is found the sending nodes are streaming thousands
> of sstables which in turns blows the bootstrapping node heap
> {noformat}
> ERROR [RMI Scheduler(0)] 2015-11-24 10:10:44,585
> JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting
> forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [STREAM-IN-/173.36.28.148] 2015-11-24 10:10:44,585
> StreamSession.java:502 - [Stream #0bb13f50-92cb-11e5-bc8d-f53b7528ffb4]
> Streaming error occurred
> java.lang.IllegalStateException: Shutdown in progress
> at
> java.lang.ApplicationShutdownHooks.remove(ApplicationShutdownHooks.java:82)
> ~[na:1.8.0_65]
> at java.lang.Runtime.removeShutdownHook(Runtime.java:239)
> ~[na:1.8.0_65]
> at
> org.apache.cassandra.service.StorageService.removeShutdownHook(StorageService.java:747)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.utils.JVMStabilityInspector$Killer.killCurrentJVM(JVMStabilityInspector.java:95)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:64)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:66)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.streaming.messages.IncomingFileMessage$1.deserialize(IncomingFileMessage.java:38)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:55)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:250)
> ~[cassandra-all-2.1.8.621.jar:2.1.8.621]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> ERROR [RMI TCP Connection(idle)] 2015-11-24 10:10:44,585
> JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting
> forcefully due to:
> java.lang.OutOfMemoryError: Java heap space
> ERROR [OptionalTasks:1] 2015-11-24 10:10:44,585 CassandraDaemon.java:223 -
> Exception in thread Thread[OptionalTasks:1,5,main]
> java.lang.IllegalStateException: Shutdown in progress
> {noformat}
> Attached is the Eclipse MAT report as a zipped web page
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)