Github user miguno commented on the pull request:

    https://github.com/apache/storm/pull/268#issuecomment-72652704
  
    Some additional feedback regarding Storm's behavior in the face of 
failures, with and without this patch.  This summary is slightly simplified to 
make it a shorter read.
    
    ### Without this patch
    
    * Tested Storm versions: 0.9.2, 0.9.3, 0.10.0-SNAPSHOT
    * Configuration: default Storm settings (cf. `conf/defaults.yaml`)
    
    Here, we can confirm the cascading failure previously described in this 
pull request and the JIRA ticket.
    
    Consider a simple topology such as:
    
    ```
                    +-----> bolt1 -----> bolt2
                    |
         spout -----+
                    |
                    +-----> bolt3
    ```
    
    * If the instances of `bolt2` die (say, because of a runtime exception), 
then `bolt1` instances will enter a reconnect-until-success-or-die loop.
        * If Storm decides to place the restarted `bolt2` instances on 
different workers (read: machine+port pairs), then `bolt1` will eventually die.
        * If Storm places the restarted `bolt2` instances on the same workers, 
then the `bolt1` instances will not die because one of their reconnection 
attempts will succeed, and normal operation will resume.
    * If `bolt1` died, too, then we enter the same 
reconnect-until-success-or-die loop in the `spout` instances.  Hence the 
cascading nature of the failure.
    
    On top of that, we also noticed the following, later phases of this 
cascading failure to occur in larger clusters, where each phase is less likely 
to happen than the previous one:
    
    1. Other spouts/bolts of the same topology -- "friends of friends" and so 
on -- may enter such loops.  In the example above, `bolt3` may start to die, 
too.
    2. Eventually the full topology may become disfunctional, a zombie: not 
dead but not alive either.
    3. Other topologies in the cluster may then become zombies, too.
    4. The full Storm cluster may enter a zombie state.  This state even turn 
out to be unrecoverable without a full cluster restart.
    
    > Funny anecdote: Because of a race condition scenario you may even observe 
that Storm spouts/bolts will begin to talk to the wrong peers, e.g. `spout` 
will talk directly to `bolt2`, even though this violates the wiring of our 
topology.
    
    ### With this patch
    
    * Tested Storm versions: We primarily tested a patched 0.10.0-SNAPSHOT 
version but also tested a patched 0.9.3 briefly.
    * Configuration: default Storm settings (cf. `conf/defaults.yaml`)
    
    Here, the behavior is different.  We didn't observe cascading failures 
anymore at the expense of "silent data loss" (see below).
    
    Again, consider this example topology:
    
    ```
                    +-----> bolt1 -----> bolt2
                    |
         spout -----+
                    |
                    +-----> bolt3
    ```
    
    With the patch, when the instances of `bolt2` die then the instances of 
`bolt2` will continue to run; i.e. they will not enter a 
reconnect-until-success-or-die loop anymore (which, particularly the not-dying 
part, was the purpose of the patch).
    
    **bolt2 behavior**
    
    We wrote a special-purpose "storm-bolt-of-death" topology that would 
consistently throw runtime exceptions in `bolt2` (aka the bolt of death) 
whenever it receives an input tuple.  The following example shows the timeline 
of `bolt2` crashing intentionally.  We observed that once the `bolt2` instances 
were restarted -- and Storm would typically restart the instances on the same 
workers (read: machine+port combinations) -- then they would not receive any 
new input tuples even though their upstream peer `bolt2` was up and running and 
constantly emitting output tuples.
    
    Summary of the `bolt2` log snippet below:
    
    * This `bolt2` instance dies at `12:49:21`, followed by an immediate 
restart (here: on the same machine+port).
    * The `bolt2` instance is up and running at `12:49:32`, but it would not 
process any new input tuple until `52 mins` later.
        * In our testing we found that the restarted `bolt2` instances took a 
consistent `52 mins` (!) to receive their first, "new" input tuple from `bolt1`.
    
    ```
    # New input tuple => let's crash!   Now the shutdown procedure begins.
    
    2015-02-03 12:49:21 c.v.s.t.s.b.BoltOfDeath [ERROR] Intentionally throwing 
this exception to trigger bolt failures
    2015-02-03 12:49:21 b.s.util [ERROR] Async loop died!
    java.lang.RuntimeException: java.lang.RuntimeException: Intentionally 
throwing this exception to trigger bolt failures
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$fn__6773$fn__6786$fn__6837.invoke(executor.clj:798)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    Caused by: java.lang.RuntimeException: Intentionally throwing this 
exception to trigger bolt failures
        at 
com.verisign.storm.tools.sbod.bolts.BoltOfDeath.execute(BoltOfDeath.scala:77) 
~[stormjar.jar:0.1.0-SNAPSHOT]
        at 
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$fn__6773$tuple_action_fn__6775.invoke(executor.clj:660)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6696.invoke(executor.clj:416)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        ... 6 common frames omitted
    2015-02-03 12:49:21 b.s.d.executor [ERROR]
    java.lang.RuntimeException: java.lang.RuntimeException: Intentionally 
throwing this exception to trigger bolt failures
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$fn__6773$fn__6786$fn__6837.invoke(executor.clj:798)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    Caused by: java.lang.RuntimeException: Intentionally throwing this 
exception to trigger bolt failures
        at 
com.verisign.storm.tools.sbod.bolts.BoltOfDeath.execute(BoltOfDeath.scala:77) 
~[stormjar.jar:0.1.0-SNAPSHOT]
        at 
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$fn__6773$tuple_action_fn__6775.invoke(executor.clj:660)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$mk_task_receiver$fn__6696.invoke(executor.clj:416)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        ... 6 common frames omitted
    2015-02-03 12:49:21 b.s.util [ERROR] Halting process: ("Worker died")
    java.lang.RuntimeException: ("Worker died")
        at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:329) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:na]
        at 
backtype.storm.daemon.worker$fn__7196$fn__7197.invoke(worker.clj:536) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.executor$mk_executor_data$fn__6606$fn__6607.invoke(executor.clj:246)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:482) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down worker 
bolt-of-death-topology-1-1422964754 783bcc5b-b571-4c7e-94f4-72ff49edc35e 6702
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-supervisor2/10.0.0.102:6702
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be 
sent with Netty-Client-supervisor2/10.0.0.102:6702..., timeout: 600000ms, 
pendings: 0
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-supervisor1/10.0.0.101:6702
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be 
sent with Netty-Client-supervisor1/10.0.0.101:6702..., timeout: 600000ms, 
pendings: 0
    2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down receive thread
    2015-02-03 12:49:21 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too 
large (300). Pinning to 29
    2015-02-03 12:49:21 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The 
baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [300]
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] New Netty Client, connect to 
supervisor3, 6702, config: , buffer_size: 5242880
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor3/127.0.1.1:6702... [0]
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] connection established to a 
remote host Netty-Client-supervisor3/127.0.1.1:6702, [id: 0x8415ae96, 
/127.0.1.1:48837 => supervisor3/127.0.1.1:6702]
    2015-02-03 12:49:21 b.s.m.loader [INFO] Shutting down receiving-thread: 
[bolt-of-death-topology-1-1422964754, 6702]
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client 
Netty-Client-supervisor3/127.0.1.1:6702
    2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be 
sent with Netty-Client-supervisor3/127.0.1.1:6702..., timeout: 600000ms, 
pendings: 0
    2015-02-03 12:49:21 b.s.m.loader [INFO] Waiting for 
receiving-thread:[bolt-of-death-topology-1-1422964754, 6702] to die
    2015-02-03 12:49:21 b.s.m.loader [INFO] Shutdown receiving-thread: 
[bolt-of-death-topology-1-1422964754, 6702]
    2015-02-03 12:49:21 b.s.d.worker [INFO] Shut down receive thread
    2015-02-03 12:49:21 b.s.d.worker [INFO] Terminating messaging context
    2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down executors
    2015-02-03 12:49:21 b.s.d.executor [INFO] Shutting down executor 
bolt-of-death-A2:[3 3]
    2015-02-03 12:49:21 b.s.util [INFO] Async loop interrupted!
    
    # Now the restart begins, which happened to be on the same machine+port.
    
    2015-02-03 12:49:28 o.a.s.z.ZooKeeper [INFO] Client 
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
    2015-02-03 12:49:28 o.a.s.z.ZooKeeper [INFO] Client 
environment:host.name=supervisor3
    [...]
    2015-02-03 12:49:32 b.s.d.executor [INFO] Prepared bolt bolt-of-death-A2:(3)
    
    # Now the bolt instance would stay idle for the next 52 mins.
    # Only metrics related log messages were reported.
    
    2015-02-03 12:50:31 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor2/10.0.0.102:6702
    2015-02-03 12:50:31 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor1/10.0.0.101:6702
    2015-02-03 12:50:31 b.s.m.n.Server [INFO] Getting metrics for server on 6702
    2015-02-03 12:51:31 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor2/10.0.0.102:6702
    2015-02-03 12:51:31 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor1/10.0.0.101:6702
    2015-02-03 12:51:31 b.s.m.n.Server [INFO] Getting metrics for server on 6702
    [...]
    ```
    
      Until then they were idle, with logs showing:
    
    ```
    # Restart-related messages end with the "I am prepared!" log line below.
    2015-02-03 11:58:52 b.s.d.executor [INFO] Prepared bolt bolt2:(3)
    
    # Then, for the next 52 minutes, only log messages relating to metrics were 
reported.
    2015-02-03 11:59:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor2/10.0.0.102:6702
    2015-02-03 11:59:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor1/10.0.0.101:6702
    2015-02-03 11:59:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
    2015-02-03 12:00:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor2/10.0.0.102:6702
    2015-02-03 12:00:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor1/10.0.0.101:6702
    2015-02-03 12:00:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
    2015-02-03 12:01:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor2/10.0.0.102:6702
    2015-02-03 12:01:52 b.s.m.n.Client [INFO] Getting metrics for connection to 
supervisor1/10.0.0.101:6702
    2015-02-03 12:01:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
    
    # 52 minutes after the bolt restart it finally started to process data 
again.
    2015-02-03 12:49:21 ...
    ```
    
    **bolt1 behavior**
    
    During this time the upstream peer `bolt1` happily reported an increasing 
number of emitted tuples, and there were no errors in the UI or in the logs.  
Here is an example log snippet of `bolt1` at the time when `bolt2` died 
(`ForwarderBolt` is `bolt1`).
    
    * `bolt1` complains about a failed connection to `bolt2` at `12:52:24`, 
which is about `3 mins` after the `bolt2` instance died `12:49:21`.
    * `bolt1` subsequently reports it re-established a connection to `bolt2` at 
`12:52:24` (the log timestamp granularity is 1 second).
        * `bolt1` reports 9 new output tuples but - if my understanding of the 
new patch is correct - this happens asynchronously now.
    * `bolt1` complains about another failed connection to `bolt2` at 
`12:52:25` (and another connection failure to a second instance of `bolt2` at 
`12:52:26`).
    * `bolt1` would then report new output tuples, but those would not reach 
the downstream `bolt1` instances until 52 minutes later.
    
    ```
    2015-02-03 12:52:24 b.s.m.n.StormClientErrorHandler [INFO] Connection 
failed Netty-Client-supervisor3/10.0.0.103:6702
    java.nio.channels.ClosedChannelException: null
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:725) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:704) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:671) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.send(Client.java:279) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:24 b.s.m.n.Client [INFO] failed to send requests to 
supervisor3/10.0.0.103:6702:
    java.nio.channels.ClosedChannelException: null
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:725) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:704) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:671) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.send(Client.java:279) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:24 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor3/10.0.0.103:6702... [0]
    2015-02-03 12:52:24 b.s.m.n.Client [INFO] connection established to a 
remote host Netty-Client-supervisor3/10.0.0.103:6702, [id: 0xa58c119c, 
/10.0.0.102:44392 => supervisor3/10.0.0.103:6702]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [bertels]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [mike]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [bertels]
    2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [nathan]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [nathan]
    2015-02-03 12:52:25 b.s.m.n.StormClientErrorHandler [INFO] Connection 
failed Netty-Client-supervisor3/10.0.0.103:6702
    java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_75]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[na:1.7.0_75]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[na:1.7.0_75]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_75]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) 
~[na:1.7.0_75]
        at 
org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [jackson]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [nathan]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [jackson]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [mike]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [bertels]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [mike]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [mike]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [jackson]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [bertels]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [mike]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [nathan]
    2015-02-03 12:52:26 b.s.m.n.StormClientErrorHandler [INFO] Connection 
failed Netty-Client-supervisor4/10.0.0.104:6702
    java.nio.channels.ClosedChannelException: null
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:725) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:704) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:671) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.send(Client.java:279) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:26 b.s.m.n.Client [INFO] failed to send requests to 
supervisor4/10.0.0.104:6702:
    java.nio.channels.ClosedChannelException: null
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:725) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
 ~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:704) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at org.apache.storm.netty.channel.Channels.write(Channels.java:671) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248) 
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.messaging.netty.Client.send(Client.java:279) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:26 b.s.m.n.Client [INFO] Reconnect started for 
Netty-Client-supervisor4/10.0.0.104:6702... [0]
    2015-02-03 12:52:26 b.s.m.n.Client [INFO] connection established to a 
remote host Netty-Client-supervisor4/10.0.0.104:6702, [id: 0xa9bd1839, 
/10.0.0.102:37751 => supervisor4/10.0.0.104:6702]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [jackson]
    2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [mike]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [bertels]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [bertels]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [bertels]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [nathan]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [jackson]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [jackson]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [nathan]
    2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [bertels]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [mike]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [golda]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [mike]
    2015-02-03 12:52:28 b.s.m.n.StormClientErrorHandler [INFO] Connection 
failed Netty-Client-supervisor4/10.0.0.104:6702
    java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_75]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) 
~[na:1.7.0_75]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) 
~[na:1.7.0_75]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_75]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) 
~[na:1.7.0_75]
        at 
org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) 
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
 [storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_75]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_75]
        at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [golda]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:10, stream: default, id: {}, [mike]
    2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple 
source: wordSpout:9, stream: default, id: {}, [jackson]
    
    
    # From this point on the bolt would continuously report new output tuples
    # (cf. the "forwarding tuple" log messages above) until its downstream peer
    # would come fully back to live (read: 52mins after restarts).
    ```
    
    ### Current conclusion
    
    At the moment this patch seems not to be improving the situation.  (I 
wouldn't rule out that we screw up merging the patch into the current 0.10.0 
master version but we didn't run into any merge conflicts so I'd say we applied 
the patch correctly.)  Silent data loss is as bad and arguably worse than a 
cascading failure.
    
    PS: We're about to share the storm-bolt-of-death topology, which might help 
with reproducing this issue in a deterministic way for the various people 
involved in this thread.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to