Hello All,
[ ] -1 Do not release this package because storm-kafka-client spout
glitches can crash workers, leading to degraded performances.
I have downloaded the binary artifacts of this storm 1.2.0rc2, copied
the binaries of storm-kafka-client, flux and flux-wrapper from the
Nessus staging repository quoted by Taylor, and ran tests on a
relatively modest configuration (1 VM for Nimbus, 1 VM for a
Supervisor node, 1 VM a Zookeeper node) with Java 8 update 172 on
CentOS 7. We use storm-kafka-client with Kafka 0.10.2.0 libs against a
large cluster of Kafka Brokers at version 1.0.1. We have ~15
topologies running on this setup.
The first glitch I noticed is a that, unlike with Storm 1.2.0 which we
use in production, we had deserialization exceptions in of our our
Spout:
- With Storm 1.2.0, these exceptions were somehow "swallowed", and
this spout would consume nothing and not crashing its worker anyway
- With Storm 1.2.2rc2, these exceptions showed up, with a crash of the
Spout's worker process. Storm restarts the worker, and then the same
crash occurs not long afte. All this leads to many Netty errors in all
our topologies' logs, which clearly means bad performances (CPU load
quite high on the Supervisor VM)
After fixing the cause of the deserialization exception (which was in
our code), this issue disappeared.
The second glitch which I just noticed is that we have a another
"situation" which is similar but yet a little bit different : on
another of our topologies, we have a spout throwing the following
exception, also leading to its worker's crash:
2018-05-06 06:55:49.560 o.a.s.k.s.KafkaSpout
Thread-6-eventKafkaSpout-executor[3 3] [INFO] Initialization complete
2018-05-06 06:55:49.636 o.a.s.util Thread-6-eventKafkaSpout-executor[3
3] [ERROR] Async loop died!
java.lang.NullPointerException: null
at
org.apache.storm.kafka.spout.KafkaSpout.emitOrRetryTuple(KafkaSpout.java:507)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.emitIfWaitingNotEmitted(KafkaSpout.java:440)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:308)
~[stormjar.jar:?]
at
org.apache.storm.daemon.executor$fn__10727$fn__10742$fn__10773.invoke(executor.clj:654)
~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484)
[storm-core-1.2.2.jar:1.2.2]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
2018-05-06 06:55:49.641 o.a.s.d.executor
Thread-6-eventKafkaSpout-executor[3 3] [ERROR]
java.lang.NullPointerException: null
at
org.apache.storm.kafka.spout.KafkaSpout.emitOrRetryTuple(KafkaSpout.java:507)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.emitIfWaitingNotEmitted(KafkaSpout.java:440)
~[stormjar.jar:?]
at
org.apache.storm.kafka.spout.KafkaSpout.nextTuple(KafkaSpout.java:308)
~[stormjar.jar:?]
at
org.apache.storm.daemon.executor$fn__10727$fn__10742$fn__10773.invoke(executor.clj:654)
~[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:484)
[storm-core-1.2.2.jar:1.2.2]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
2018-05-06 06:55:49.702 o.a.s.util Thread-6-eventKafkaSpout-executor[3
3] [ERROR] Halting process: ("Worker died")
java.lang.RuntimeException: ("Worker died")
at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
[storm-core-1.2.2.jar:1.2.2]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.7.0.jar:?]
at
org.apache.storm.daemon.worker$fn__11404$fn__11405.invoke(worker.clj:792)
[storm-core-1.2.2.jar:1.2.2]
at
org.apache.storm.daemon.executor$mk_executor_data$fn__10612$fn__10613.invoke(executor.clj:281)
[storm-core-1.2.2.jar:1.2.2]
at org.apache.storm.util$async_loop$fn__553.invoke(util.clj:494)
[storm-core-1.2.2.jar:1.2.2]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_172]
2018-05-06 06:55:49.706 o.a.s.d.worker Thread-15 [INFO] Shutting down
worker
metricAggregation_ec2-34-248-249-45-eu-west-1-compute-amazonaws-com_defaultStormTopic-106-1525589734
871ced6c-14c4-4a59-a774-579bf357314f 6714
2018-05-06 06:55:49.707 o.a.s.d.worker Thread-15 [INFO] Terminating
messaging context
2018-05-06 06:55:49.707 o.a.s.d.worker Thread-15 [INFO] Shutting down executors
This exception looks like
https://issues.apache.org/jira/browse/STORM-3032, but I can't tell for
sure, except that the line number of the exception corresponds to this
line of KafkaSpout.java:
offsetManagers.get(tp).addToEmitMsgs(msgId.offset());
To sum up, I am voting [-1] on this 1.2.0rc2, because I have the
feeling that exceptions in Kafka Spout should be gracefully caught and
never lead to work crashes. I understand that the root cause of these
exceptions can come from the specific code we have in our topologies,
and for the 1st case I was glad to see it because it was an easy fix
on our side, but nevertheless on a production system, one can have
sometimes exceptions and the performance pain of workers crash is
simply not affordable in production.
I don't know if a JIRA already exists on this generic issue that kafka
spout exceptions should be gracefully catched (and maybe lead to
"Failed" tuples, so that there would be a tracking anyway? or at least
a log message in Storm UI ?)
Please note that this JIRA would differ from STORM-3032 because
STORM-3032 seems to be specific to one case, where as in my opinion
the crash issue is more generic - as my two different cases show.
Best regards,
Alexandre Vermeerbergen
2018-05-03 19:18 GMT+02:00 P. Taylor Goetz <[email protected]>:
> CORRECTION: The Nexus staging repository for this rc is:
>
> https://repository.apache.org/content/repositories/orgapachestorm-1064
>
>
> On May 3, 2018, at 11:42 AM, P. Taylor Goetz <[email protected]> wrote:
>
> This is a call to vote on releasing Apache Storm 1.2.2 (rc2)
>
> Full list of changes in this release:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.2.2-rc2/RELEASE_NOTES.html
>
> The tag/commit to be voted upon is v1.2.2:
>
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=tree;h=7cb19fb3befa65e5ff9e5e02f38e16de865982a9;hb=e001672cf0ea59fe6989b563fb6bbb450fe8e7e5
>
> The source archive being voted upon can be found here:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.2.2-rc2/apache-storm-1.2.2-src.tar.gz
>
> Other release files, signatures and digests can be found here:
>
> https://dist.apache.org/repos/dist/dev/storm/apache-storm-1.2.2-rc2/
>
> The release artifacts are signed with the following key:
>
> https://git-wip-us.apache.org/repos/asf?p=storm.git;a=blob_plain;f=KEYS;hb=22b832708295fa2c15c4f3c70ac0d2bc6fded4bd
>
> The Nexus staging repository for this release is:
>
> https://repository.apache.org/content/repositories/orgapachestorm-1062
>
> Please vote on releasing this package as Apache Storm 1.2.2.
>
> When voting, please list the actions taken to verify the release.
>
> This vote will be open for 72 hours or until at least 3 PMC members vote +1.
>
> [ ] +1 Release this package as Apache Storm 1.2.2
> [ ] 0 No opinion
> [ ] -1 Do not release this package because...
>
> Thanks to everyone who contributed to this release.
>
> -Taylor
>
>