[ 
https://issues.apache.org/jira/browse/STORM-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14564737#comment-14564737
 ] 

ASF GitHub Bot commented on STORM-839:
--------------------------------------

Github user miguno commented on the pull request:

    https://github.com/apache/storm/pull/566#issuecomment-106795826
  
    > My reasoning is as follows:
    > 
    > * One has three option to deal with Netty's buffer filling up:
    >     1. Discard incoming new messages
    >     2. Block client thread until there is space (back pressure)
    >     3. Keep buffering up until OOME is thrown
    >
    > My guess is that the v0.9.4 code attempted to implement option (i), but 
actually the behavior is option (iii).
    
    The code's intention was actually (iii).  As you described back pressure 
(ii) was not picked because this will require a significant amount of work, 
which was thus out of scope for fixing STORM-329.  The reason (iii) was 
preferred over (i) was also as you described -- if (and only if) users have 
enabled acking = guaranteed message processing for a topology, they can prevent 
OOM errors from happening by setting 
[`topology.max.spout.pending`](https://github.com/apache/storm/blob/master/conf/defaults.yaml#L180).
    
    I'll have to look at your code in more detail before commenting.  I'll also 
ping @ptgoetz, @clockfly, @tedxia, and @revans2 who were involved in this 
significant patch (sorry for the spam, folks!).


> Deadlock hazard in backtype.storm.messaging.netty.Client
> --------------------------------------------------------
>
>                 Key: STORM-839
>                 URL: https://issues.apache.org/jira/browse/STORM-839
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.9.4
>            Reporter: Enno Shioji
>            Priority: Critical
>
> See the thread dump below that shows the deadlock. client-worker-1 is holding 
> 7b5a7fa5 and waiting on 1446a1e9. Thread-10-disruptor-worker-transfer-queue 
> is holding 1446a1e9 and is waiting on 7b5a7fa5.
> (Thread dump is truncated to show only the relevant parts)
> 2015-05-28 15:37:15
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.72-b04 mixed mode):
> "Thread-10-disruptor-worker-transfer-queue" - Thread t@52
>    java.lang.Thread.State: BLOCKED
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:398)
>       - waiting to lock <7b5a7fa5> (a java.lang.Object) owned by 
> "client-worker-1" t@25
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
>       at 
> org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
>       at 
> org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
>       at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
>       at 
> org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
>       at 
> org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
>       at 
> org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
>       at 
> org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
>       at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
>       at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
>       at 
> org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
>       at backtype.storm.messaging.netty.Client.flushMessages(Client.java:480)
>       - locked <1446a1e9> (a backtype.storm.messaging.netty.Client)
>       at backtype.storm.messaging.netty.Client.send(Client.java:412)
>       - locked <1446a1e9> (a backtype.storm.messaging.netty.Client)
>       at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
>       at 
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014$fn__5015.invoke(worker.clj:334)
>       at 
> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__5014.invoke(worker.clj:332)
>       at 
> backtype.storm.disruptor$clojure_handler$reify__1446.onEvent(disruptor.clj:58)
>       at 
> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
>       at 
> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
>       at 
> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
>       at 
> backtype.storm.disruptor$consume_loop_STAR_$fn__1459.invoke(disruptor.clj:94)
>       at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463)
>       at clojure.lang.AFn.run(AFn.java:24)
>       at java.lang.Thread.run(Unknown Source)
>    Locked ownable synchronizers:
>       - None
> "client-worker-1" - Thread t@25
>    java.lang.Thread.State: BLOCKED
>       at 
> backtype.storm.messaging.netty.Client.closeChannelAndReconnect(Client.java:501)
>       - waiting to lock <1446a1e9> (a backtype.storm.messaging.netty.Client) 
> owned by "Thread-10-disruptor-worker-transfer-queue" t@52
>       at backtype.storm.messaging.netty.Client.access$1400(Client.java:78)
>       at 
> backtype.storm.messaging.netty.Client$3.operationComplete(Client.java:492)
>       at 
> org.apache.storm.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:427)
>       at 
> org.apache.storm.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:413)
>       at 
> org.apache.storm.netty.channel.DefaultChannelFuture.setFailure(DefaultChannelFuture.java:380)
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:437)
>       - locked <7b5a7fa5> (a java.lang.Object)
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:373)
>       at 
> org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:93)
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
>       at 
> org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
>       at 
> org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>       at 
> org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
>       at 
> org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>       at java.lang.Thread.run(Unknown Source)
>    Locked ownable synchronizers:
>       - locked <75e528fd> (a java.util.concurrent.ThreadPoolExecutor$Worker)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to