[
https://issues.apache.org/jira/browse/STORM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14612982#comment-14612982
]
ASF GitHub Bot commented on STORM-839:
--------------------------------------
GitHub user eshioji opened a pull request:
https://github.com/apache/storm/pull/617
Storm 763/839 0.10.x
This is a port of PR #568 from 0.9.x to 0.10.x. It fixes STORM-839
(Deadlock) and STORM-763 (Establish Netty reconnects asynchronously and reduce
verbosity of error logs)
The content is equivalent to the PR for 0.11.x (#616)
@revans2 This is the PR for 0.10.x. I also ran the performance test for
this and the results were again inline with what I got with 0.9.x. I get build
failure on my machine but I get the same failure with 0.10.x HEAD, so I'm
guessing it's unrelated to this change.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/eshioji/storm STORM-763_0.10.x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/617.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #617
----
commit 286bacdf49937d1e8576eff27dfc887824ffdbbb
Author: Enno Shioji <[email protected]>
Date: 2015-07-02T18:01:01Z
This fixes STORM-763 and STORM-839
commit a2502c3bc3bcd4caf3800bb645058abb61d2a071
Author: Enno Shioji <[email protected]>
Date: 2015-07-02T23:51:32Z
Merge remote-tracking branch 'upstream/0.10.x-branch' into STORM-763_0.10.x
# Conflicts:
# storm-core/src/jvm/backtype/storm/messaging/netty/Client.java
commit f5db06ce2809c3b66f2d797f979f8c40133c2f60
Author: Enno Shioji <[email protected]>
Date: 2015-07-03T00:02:00Z
Bring back removal of client from context on closing
----
> Deadlock hazard in backtype.storm.messaging.netty.Client
> --------------------------------------------------------
>
> Key: STORM-839
> URL: https://issues.apache.org/jira/browse/STORM-839
> Project: Apache Storm
> Issue Type: Bug
> Affects Versions: 0.9.4
> Reporter: Enno Shioji
> Priority: Critical
>
> See the thread dump below that shows the deadlock. client-worker-1 is holding
> 7b5a7fa5 and waiting on 1446a1e9. Thread-10-disruptor-worker-transfer-queue
> is holding 1446a1e9 and is waiting on 7b5a7fa5.
> (Thread dump is truncated to show only the relevant parts)
> 2015-05-28 15:37:15
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode):
> "Thread-10-disruptor-worker-transfer-queue" - Thread t@52
> java.lang.Thread.State: BLOCKED
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398)
> - waiting to lock <7b5a7fa5> (a java.lang.Object) owned by
> "client-worker-1" t@25
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
> at
> org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
> at
> org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
> at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
> at
> org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
> at
> org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
> at
> org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
> at
> org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
> at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
> at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
> at
> org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
> at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480)
> - locked <1446a1e9> (a backtype.storm.messaging.netty.Client)
> at backtype.storm.messaging.netty.Client.send(Client.java:412)
> - locked <1446a1e9> (a backtype.storm.messaging.netty.Client)
> at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014$fn__5015.invoke(worker.clj:334)
> at
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014.invoke(worker.clj:332)
> at
> backtype.storm.disruptor$clojure_handler$reify__1446.onEvent(disruptor.clj:58)
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
> at
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
> at
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
> at
> backtype.storm.disruptor$consume_loop_STAR_$fn__1459.invoke(disruptor.clj:94)
> at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463)
> at clojure.lang.AFn.run(AFn.java:24)
> at java.lang.Thread.run(Unknown Source)
> Locked ownable synchronizers:
> - None
> "client-worker-1" - Thread t@25
> java.lang.Thread.State: BLOCKED
> at
> backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501)
> - waiting to lock <1446a1e9> (a backtype.storm.messaging.netty.Client)
> owned by "Thread-10-disruptor-worker-transfer-queue" t@52
> at backtype.storm.messaging.netty.Client.access$1400(Client.java:78)
> at
> backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492)
> at
> org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
> at
> org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
> at
> org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:437)
> - locked <7b5a7fa5> (a java.lang.Object)
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
> at
> org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
> at
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
> at
> org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
> at
> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
> at
> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> Locked ownable synchronizers:
> - locked <75e528fd> (a java.util.concurrent.ThreadPoolExecutor$Worker)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)