Github user miguno commented on the pull request:
https://github.com/apache/storm/pull/268#issuecomment-72652704
Some additional feedback regarding Storm's behavior in the face of
failures, with and without this patch. This summary is slightly simplified to
make it a shorter read.
### Without this patch
* Tested Storm versions: 0.9.2, 0.9.3, 0.10.0-SNAPSHOT
* Configuration: default Storm settings (cf. `conf/defaults.yaml`)
Here, we can confirm the cascading failure previously described in this
pull request and the JIRA ticket.
Consider a simple topology such as:
```
+-----> bolt1 -----> bolt2
|
spout -----+
|
+-----> bolt3
```
* If the instances of `bolt2` die (say, because of a runtime exception),
then `bolt1` instances will enter a reconnect-until-success-or-die loop.
* If Storm decides to place the restarted `bolt2` instances on
different workers (read: machine+port pairs), then `bolt1` will eventually die.
* If Storm places the restarted `bolt2` instances on the same workers,
then the `bolt1` instances will not die because one of their reconnection
attempts will succeed, and normal operation will resume.
* If `bolt1` died, too, then we enter the same
reconnect-until-success-or-die loop in the `spout` instances. Hence the
cascading nature of the failure.
On top of that, we also noticed the following, later phases of this
cascading failure to occur in larger clusters, where each phase is less likely
to happen than the previous one:
1. Other spouts/bolts of the same topology -- "friends of friends" and so
on -- may enter such loops. In the example above, `bolt3` may start to die,
too.
2. Eventually the full topology may become disfunctional, a zombie: not
dead but not alive either.
3. Other topologies in the cluster may then become zombies, too.
4. The full Storm cluster may enter a zombie state. This state even turn
out to be unrecoverable without a full cluster restart.
> Funny anecdote: Because of a race condition scenario you may even observe
that Storm spouts/bolts will begin to talk to the wrong peers, e.g. `spout`
will talk directly to `bolt2`, even though this violates the wiring of our
topology.
### With this patch
* Tested Storm versions: We primarily tested a patched 0.10.0-SNAPSHOT
version but also tested a patched 0.9.3 briefly.
* Configuration: default Storm settings (cf. `conf/defaults.yaml`)
Here, the behavior is different. We didn't observe cascading failures
anymore at the expense of "silent data loss" (see below).
Again, consider this example topology:
```
+-----> bolt1 -----> bolt2
|
spout -----+
|
+-----> bolt3
```
With the patch, when the instances of `bolt2` die then the instances of
`bolt2` will continue to run; i.e. they will not enter a
reconnect-until-success-or-die loop anymore (which, particularly the not-dying
part, was the purpose of the patch).
**bolt2 behavior**
We wrote a special-purpose "storm-bolt-of-death" topology that would
consistently throw runtime exceptions in `bolt2` (aka the bolt of death)
whenever it receives an input tuple. The following example shows the timeline
of `bolt2` crashing intentionally. We observed that once the `bolt2` instances
were restarted -- and Storm would typically restart the instances on the same
workers (read: machine+port combinations) -- then they would not receive any
new input tuples even though their upstream peer `bolt2` was up and running and
constantly emitting output tuples.
Summary of the `bolt2` log snippet below:
* This `bolt2` instance dies at `12:49:21`, followed by an immediate
restart (here: on the same machine+port).
* The `bolt2` instance is up and running at `12:49:32`, but it would not
process any new input tuple until `52 mins` later.
* In our testing we found that the restarted `bolt2` instances took a
consistent `52 mins` (!) to receive their first, "new" input tuple from `bolt1`.
```
# New input tuple => let's crash! Now the shutdown procedure begins.
2015-02-03 12:49:21 c.v.s.t.s.b.BoltOfDeath [ERROR] Intentionally throwing
this exception to trigger bolt failures
2015-02-03 12:49:21 b.s.util [ERROR] Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: Intentionally
throwing this exception to trigger bolt failures
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$fn__6773$fn__6786$fn__6837.invoke(executor.clj:798)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
Caused by: java.lang.RuntimeException: Intentionally throwing this
exception to trigger bolt failures
at
com.verisign.storm.tools.sbod.bolts.BoltOfDeath.execute(BoltOfDeath.scala:77)
~[stormjar.jar:0.1.0-SNAPSHOT]
at
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$fn__6773$tuple_action_fn__6775.invoke(executor.clj:660)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$mk_task_receiver$fn__6696.invoke(executor.clj:416)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
... 6 common frames omitted
2015-02-03 12:49:21 b.s.d.executor [ERROR]
java.lang.RuntimeException: java.lang.RuntimeException: Intentionally
throwing this exception to trigger bolt failures
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$fn__6773$fn__6786$fn__6837.invoke(executor.clj:798)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
Caused by: java.lang.RuntimeException: Intentionally throwing this
exception to trigger bolt failures
at
com.verisign.storm.tools.sbod.bolts.BoltOfDeath.execute(BoltOfDeath.scala:77)
~[stormjar.jar:0.1.0-SNAPSHOT]
at
backtype.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$fn__6773$tuple_action_fn__6775.invoke(executor.clj:660)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$mk_task_receiver$fn__6696.invoke(executor.clj:416)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
... 6 common frames omitted
2015-02-03 12:49:21 b.s.util [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:329)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.6.0.jar:na]
at
backtype.storm.daemon.worker$fn__7196$fn__7197.invoke(worker.clj:536)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.executor$mk_executor_data$fn__6606$fn__6607.invoke(executor.clj:246)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:482)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down worker
bolt-of-death-topology-1-1422964754 783bcc5b-b571-4c7e-94f4-72ff49edc35e 6702
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client
Netty-Client-supervisor2/10.0.0.102:6702
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be
sent with Netty-Client-supervisor2/10.0.0.102:6702..., timeout: 600000ms,
pendings: 0
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client
Netty-Client-supervisor1/10.0.0.101:6702
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be
sent with Netty-Client-supervisor1/10.0.0.101:6702..., timeout: 600000ms,
pendings: 0
2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down receive thread
2015-02-03 12:49:21 o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too
large (300). Pinning to 29
2015-02-03 12:49:21 b.s.u.StormBoundedExponentialBackoffRetry [INFO] The
baseSleepTimeMs [100] the maxSleepTimeMs [1000] the maxRetries [300]
2015-02-03 12:49:21 b.s.m.n.Client [INFO] New Netty Client, connect to
supervisor3, 6702, config: , buffer_size: 5242880
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-supervisor3/127.0.1.1:6702... [0]
2015-02-03 12:49:21 b.s.m.n.Client [INFO] connection established to a
remote host Netty-Client-supervisor3/127.0.1.1:6702, [id: 0x8415ae96,
/127.0.1.1:48837 => supervisor3/127.0.1.1:6702]
2015-02-03 12:49:21 b.s.m.loader [INFO] Shutting down receiving-thread:
[bolt-of-death-topology-1-1422964754, 6702]
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Closing Netty Client
Netty-Client-supervisor3/127.0.1.1:6702
2015-02-03 12:49:21 b.s.m.n.Client [INFO] Waiting for pending batchs to be
sent with Netty-Client-supervisor3/127.0.1.1:6702..., timeout: 600000ms,
pendings: 0
2015-02-03 12:49:21 b.s.m.loader [INFO] Waiting for
receiving-thread:[bolt-of-death-topology-1-1422964754, 6702] to die
2015-02-03 12:49:21 b.s.m.loader [INFO] Shutdown receiving-thread:
[bolt-of-death-topology-1-1422964754, 6702]
2015-02-03 12:49:21 b.s.d.worker [INFO] Shut down receive thread
2015-02-03 12:49:21 b.s.d.worker [INFO] Terminating messaging context
2015-02-03 12:49:21 b.s.d.worker [INFO] Shutting down executors
2015-02-03 12:49:21 b.s.d.executor [INFO] Shutting down executor
bolt-of-death-A2:[3 3]
2015-02-03 12:49:21 b.s.util [INFO] Async loop interrupted!
# Now the restart begins, which happened to be on the same machine+port.
2015-02-03 12:49:28 o.a.s.z.ZooKeeper [INFO] Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2015-02-03 12:49:28 o.a.s.z.ZooKeeper [INFO] Client
environment:host.name=supervisor3
[...]
2015-02-03 12:49:32 b.s.d.executor [INFO] Prepared bolt bolt-of-death-A2:(3)
# Now the bolt instance would stay idle for the next 52 mins.
# Only metrics related log messages were reported.
2015-02-03 12:50:31 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor2/10.0.0.102:6702
2015-02-03 12:50:31 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor1/10.0.0.101:6702
2015-02-03 12:50:31 b.s.m.n.Server [INFO] Getting metrics for server on 6702
2015-02-03 12:51:31 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor2/10.0.0.102:6702
2015-02-03 12:51:31 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor1/10.0.0.101:6702
2015-02-03 12:51:31 b.s.m.n.Server [INFO] Getting metrics for server on 6702
[...]
```
Until then they were idle, with logs showing:
```
# Restart-related messages end with the "I am prepared!" log line below.
2015-02-03 11:58:52 b.s.d.executor [INFO] Prepared bolt bolt2:(3)
# Then, for the next 52 minutes, only log messages relating to metrics were
reported.
2015-02-03 11:59:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor2/10.0.0.102:6702
2015-02-03 11:59:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor1/10.0.0.101:6702
2015-02-03 11:59:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
2015-02-03 12:00:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor2/10.0.0.102:6702
2015-02-03 12:00:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor1/10.0.0.101:6702
2015-02-03 12:00:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
2015-02-03 12:01:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor2/10.0.0.102:6702
2015-02-03 12:01:52 b.s.m.n.Client [INFO] Getting metrics for connection to
supervisor1/10.0.0.101:6702
2015-02-03 12:01:52 b.s.m.n.Server [INFO] Getting metrics for server on 6702
# 52 minutes after the bolt restart it finally started to process data
again.
2015-02-03 12:49:21 ...
```
**bolt1 behavior**
During this time the upstream peer `bolt1` happily reported an increasing
number of emitted tuples, and there were no errors in the UI or in the logs.
Here is an example log snippet of `bolt1` at the time when `bolt2` died
(`ForwarderBolt` is `bolt1`).
* `bolt1` complains about a failed connection to `bolt2` at `12:52:24`,
which is about `3 mins` after the `bolt2` instance died `12:49:21`.
* `bolt1` subsequently reports it re-established a connection to `bolt2` at
`12:52:24` (the log timestamp granularity is 1 second).
* `bolt1` reports 9 new output tuples but - if my understanding of the
new patch is correct - this happens asynchronously now.
* `bolt1` complains about another failed connection to `bolt2` at
`12:52:25` (and another connection failure to a second instance of `bolt2` at
`12:52:26`).
* `bolt1` would then report new output tuples, but those would not reach
the downstream `bolt1` instances until 52 minutes later.
```
2015-02-03 12:52:24 b.s.m.n.StormClientErrorHandler [INFO] Connection
failed Netty-Client-supervisor3/10.0.0.103:6702
java.nio.channels.ClosedChannelException: null
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.send(Client.java:279)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:24 b.s.m.n.Client [INFO] failed to send requests to
supervisor3/10.0.0.103:6702:
java.nio.channels.ClosedChannelException: null
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.send(Client.java:279)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:24 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-supervisor3/10.0.0.103:6702... [0]
2015-02-03 12:52:24 b.s.m.n.Client [INFO] connection established to a
remote host Netty-Client-supervisor3/10.0.0.103:6702, [id: 0xa58c119c,
/10.0.0.102:44392 => supervisor3/10.0.0.103:6702]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [bertels]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [mike]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [bertels]
2015-02-03 12:52:24 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [nathan]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [nathan]
2015-02-03 12:52:25 b.s.m.n.StormClientErrorHandler [INFO] Connection
failed Netty-Client-supervisor3/10.0.0.103:6702
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_75]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_75]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_75]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_75]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_75]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_75]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [jackson]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [nathan]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [jackson]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [mike]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:25 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [bertels]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [mike]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [mike]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [jackson]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [bertels]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [mike]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [nathan]
2015-02-03 12:52:26 b.s.m.n.StormClientErrorHandler [INFO] Connection
failed Netty-Client-supervisor4/10.0.0.104:6702
java.nio.channels.ClosedChannelException: null
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.send(Client.java:279)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) ~[clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:26 b.s.m.n.Client [INFO] failed to send requests to
supervisor4/10.0.0.104:6702:
java.nio.channels.ClosedChannelException: null
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.cleanUpWriteBuffer(AbstractNioWorker.java:433)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.writeFromUserCode(AbstractNioWorker.java:128)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:84)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendDownstream(DefaultChannelPipeline.java:779)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:725)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.doEncode(OneToOneEncoder.java:71)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:59)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:591)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:582)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:704)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at org.apache.storm.netty.channel.Channels.write(Channels.java:671)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.AbstractChannel.write(AbstractChannel.java:248)
~[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.flushRequest(Client.java:398)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.messaging.netty.Client.send(Client.java:279)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086$fn__7087.invoke(worker.clj:351)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__7086.invoke(worker.clj:349)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$clojure_handler$reify__871.onEvent(disruptor.clj:58)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
backtype.storm.disruptor$consume_loop_STAR_$fn__884.invoke(disruptor.clj:94)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at backtype.storm.util$async_loop$fn__550.invoke(util.clj:472)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:26 b.s.m.n.Client [INFO] Reconnect started for
Netty-Client-supervisor4/10.0.0.104:6702... [0]
2015-02-03 12:52:26 b.s.m.n.Client [INFO] connection established to a
remote host Netty-Client-supervisor4/10.0.0.104:6702, [id: 0xa9bd1839,
/10.0.0.102:37751 => supervisor4/10.0.0.104:6702]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [jackson]
2015-02-03 12:52:26 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [mike]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [bertels]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [bertels]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [bertels]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [nathan]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [jackson]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [jackson]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [nathan]
2015-02-03 12:52:27 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [bertels]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [mike]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [golda]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [mike]
2015-02-03 12:52:28 b.s.m.n.StormClientErrorHandler [INFO] Connection
failed Netty-Client-supervisor4/10.0.0.104:6702
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_75]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.7.0_75]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
~[na:1.7.0_75]
at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_75]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
~[na:1.7.0_75]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
org.apache.storm.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
[storm-core-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_75]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_75]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [golda]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:10, stream: default, id: {}, [mike]
2015-02-03 12:52:28 c.v.s.t.s.b.ForwarderBolt [INFO] Forwarding tuple
source: wordSpout:9, stream: default, id: {}, [jackson]
# From this point on the bolt would continuously report new output tuples
# (cf. the "forwarding tuple" log messages above) until its downstream peer
# would come fully back to live (read: 52mins after restarts).
```
### Current conclusion
At the moment this patch seems not to be improving the situation. (I
wouldn't rule out that we screw up merging the patch into the current 0.10.0
master version but we didn't run into any merge conflicts so I'd say we applied
the patch correctly.) Silent data loss is as bad and arguably worse than a
cascading failure.
PS: We're about to share the storm-bolt-of-death topology, which might help
with reproducing this issue in a deterministic way for the various people
involved in this thread.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---