Ethan Li created STORM-3735:
-------------------------------
Summary: Kyro serialization fails on some metric tuples when
topology.fall.back.on.java.serialization is false
Key: STORM-3735
URL: https://issues.apache.org/jira/browse/STORM-3735
Project: Apache Storm
Issue Type: Bug
Reporter: Ethan Li
When a metric consumer is used, metrics will be sent from all executors to the
consumer. In some of the metrics, it includes NodeInfo object, and kryo
serialization will fail if topology.fall.back.on.java.serialization is false.
{code:title=worker logs}
2021-01-13 20:16:37.017 o.a.s.e.ExecutorTransfer
Thread-16-__system-executor[-1, -1] [INFO] TRANSFERRING tuple [dest: 5 tuple:
source: __system:-1, stream: __metrics, id: {}, [TASK_INFO: { host:
openstorm14blue-n4.blue.ygrid.yahoo.com:6703 comp: __system[-1]}, [
[CGroupCpuStat = {nr.throttled-percentage=46.544980443285525,
nr.period-count=767, nr.throttled-count=357, throttled.time-ms=27208}],
[CGroupMemoryLimit = 1342177280], [__recv-iconnection = {dequeuedMessages=0,
enqueued={/10.215.73.210:47038=3169}}], [__send-ico
nnection = {NodeInfo(node:149a917b-bc75-49c8-b351-f74b8ae0fbed-10.215.73.210,
port:[6701])={reconnects=1, src=/10.215.73.210:34938, pending=0,
dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6701, sent=1896,
lostOnSend=0}, NodeInfo(node:149a917b-bc75-
49c8-b351-f74b8ae0fbed-10.215.73.210, port:[6702])={reconnects=8,
src=/10.215.73.210:39476, pending=0,
dest=openstorm14blue-n4.blue.ygrid.yahoo.com/10.215.73.210:6702, sent=2115,
lostOnSend=0},
NodeInfo(node:b77b5ec6-15ee-4bd2-a9b8-12fcadde7744-10.215.73.211, po
rt:[6700])={reconnects=125, pending=0,
dest=openstorm14blue-n5.blue.ygrid.yahoo.com/10.215.73.211:6700, sent=108,
lostOnSend=1331}}], [CGroupMemory = 316485632], [CGroupCpu = {user-ms=36960,
sys-ms=25860}], [memory.pools.Metaspace.usage = 0.9695890907929322], [m
emory.heap.max = 1073741824], [receive-queue-overflow = 0],
[memory.pools.Compressed-Class-Space.used = 6237424],
[memory.pools.Compressed-Class-Space.max = 1073741824], [memory.non-heap.init =
2555904], [worker-transfer-queue-overflow = 0], [memory.pools.Metasp
ace.committed = 42074112], [receive-queue-sojourn_time_ms = 0.0],
[threads.waiting.count = 5], [memory.pools.G1-Eden-Space.usage =
0.2777777777777778], [memory.pools.Metaspace.used = 40798320],
[memory.total.used = 101783888], [memory.pools.Code-Cache.init = 255
5904], [memory.non-heap.committed = 63832064], [GC.G1-Young-Generation.time =
677], [receive-queue-insert_failures = 0.0], [memory.total.init = 130482176],
[GC.G1-Old-Generation.count = 0], [memory.pools.Metaspace.init = 0],
[memory.pools.G1-Survivor-Space.commi
tted = 5242880], [worker-transfer-queue-population = 0],
[memory.pools.Compressed-Class-Space.committed = 6684672],
[threads.timed_waiting.count = 31], [memory.pools.G1-Eden-Space.init =
7340032], [memory.pools.Metaspace.max = -1], [memory.pools.G1-Survivor-Spac
e.used = 5242880], [memory.heap.init = 127926272],
[memory.pools.G1-Old-Gen.used-after-gc = 0], [worker-transfer-queue-capacity =
1024], [memory.pools.G1-Survivor-Space.used-after-gc = 5242880],
[memory.pools.G1-Old-Gen.committed = 47185920], [memory.pools.G1-Ed
en-Space.committed = 75497472], [receive-queue-arrival_rate_secs =
0.109421162052741], [memory.pools.Compressed-Class-Space.usage =
0.0058090537786483765], [TGT-TimeToExpiryMsecs = 71282993],
[threads.runnable.count = 15], [worker-transfer-queue-insert_failures
= 0.0], [worker-transfer-queue-sojourn_time_ms = 0.0], [memory.heap.committed =
127926272], [memory.non-heap.max = -1], [threads.daemon.count = 29],
[memory.pools.Code-Cache.max = 251658240],
[worker-transfer-queue-arrival_rate_secs = 90.47776674390379], [memory
.heap.usage = 0.037109360098838806], [memory.pools.G1-Old-Gen.init =
120586240], [memory.pools.Code-Cache.committed = 15138816],
[receive-queue-pct_full = 0.0], [worker-transfer-queue-pct_full = 0.0],
[receive-queue-population = 0], [memory.pools.Compressed-Clas
s-Space.init = 0], [memory.pools.Code-Cache.usage = 0.059299468994140625],
[worker-transfer-queue-dropped_messages = 0], [GC.G1-Young-Generation.count =
18], [memory.pools.Code-Cache.used = 14923200], [memory.pools.G1-Old-Gen.usage
= 0.012695297598838806], [memo
ry.non-heap.usage = -6.196368E7], [memory.total.max = 1073741823],
[threads.count = 51], [memory.heap.used = 39845872],
[memory.pools.G1-Survivor-Space.init = 0], [memory.pools.G1-Old-Gen.used =
13631472], [receive-queue-dropped_messages = 0], [threads.terminate
d.count = 0], [memory.pools.G1-Eden-Space.max = -1], [uptimeSecs = 76],
[threads.deadlock.count = 0], [threads.blocked.count = 0], [newWorkerEvent =
1], [receive-queue-capacity = 32768], [threads.new.count = 0], [startTimeSecs =
1610568920], [memory.pools.G1-Ede
n-Space.used-after-gc = 0], [memory.pools.G1-Eden-Space.used = 20971520],
[GC.G1-Old-Generation.time = 0], [memory.non-heap.used = 61964384],
[memory.pools.G1-Old-Gen.max = 1073741824], [memory.pools.G1-Survivor-Space.max
= -1], [memory.pools.G1-Survivor-Space.u
sage = 1.0], [memory.total.committed = 191823872], [doHeartbeat-calls.count =
64], [doHeartbeat-calls.m1_rate = 1.0730202200365234E-6],
[doHeartbeat-calls.m5_rate = 1.1636999000665182E-6],
[doHeartbeat-calls.m15_rate = 1.1870955900857726E-6], [doHeartbeat-calls.
mean_rate = 1.0067076836696486E-6]]] PROC_START_TIME(sampled): null
EXEC_START_TIME(sampled): null]
...
2021-01-13 20:16:37.030 o.a.s.u.Utils Thread-16-__system-executor[-1, -1]
[ERROR] Async loop died!
java.lang.RuntimeException: com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Class is not registered:
org.apache.storm.generated.NodeInfo
Note: To register this class use:
kryo.register(org.apache.storm.generated.NodeInfo.class);
Serialization trace:
value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
at org.apache.storm.executor.Executor.accept(Executor.java:294)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:159)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:145)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.Utils$1.run(Utils.java:401)
[storm-client-2.3.0.y.jar:2.3.0.y]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Class is not registered:
org.apache.storm.generated.NodeInfo
Note: To register this class use:
kryo.register(org.apache.storm.generated.NodeInfo.class);
Serialization trace:
value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
at
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
Serialization trace:
value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
at org.apache.storm.executor.Executor.accept(Executor.java:294)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.JCQueue.consumeImpl(JCQueue.java:113)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.JCQueue.consume(JCQueue.java:89)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:159)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor$1.call(BoltExecutor.java:145)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.utils.Utils$1.run(Utils.java:401)
[storm-client-2.3.0.y.jar:2.3.0.y]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: com.esotericsoftware.kryo.KryoException:
java.lang.IllegalArgumentException: Class is not registered:
org.apache.storm.generated.NodeInfo
Note: To register this class use:
kryo.register(org.apache.storm.generated.NodeInfo.class);
Serialization trace:
value (org.apache.storm.metric.api.IMetricsConsumer$DataPoint)
at
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
~[kryo-3.0.3.jar:?]
at
org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:553)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.daemon.Task.sendUnanchored(Task.java:215)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.executor.Executor.metricsTick(Executor.java:345)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:205)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.executor.Executor.accept(Executor.java:290)
~[storm-client-2.3.0.y.jar:2.3.0.y]
... 6 more
Caused by: java.lang.IllegalArgumentException: Class is not registered:
org.apache.storm.generated.NodeInfo
Note: To register this class use:
kryo.register(org.apache.storm.generated.NodeInfo.class);
at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:488)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:517)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:622)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:106)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:552)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:80)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:518)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:628)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:100)
~[kryo-3.0.3.jar:?]
at
com.esotericsoftware.kryo.serializers.CollectionSerializer.write(CollectionSerializer.java:40)
~[kryo-3.0.3.jar:?]
at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:534)
~[kryo-3.0.3.jar:?]
at
org.apache.storm.serialization.KryoValuesSerializer.serializeInto(KryoValuesSerializer.java:38)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:40)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.daemon.worker.WorkerTransfer.tryTransferRemote(WorkerTransfer.java:118)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.daemon.worker.WorkerState.tryTransferRemote(WorkerState.java:553)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.ExecutorTransfer.tryTransfer(ExecutorTransfer.java:68)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.daemon.Task.sendUnanchored(Task.java:215)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.executor.Executor.metricsTick(Executor.java:345)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at
org.apache.storm.executor.bolt.BoltExecutor.tupleActionFn(BoltExecutor.java:205)
~[storm-client-2.3.0.y.jar:2.3.0.y]
at org.apache.storm.executor.Executor.accept(Executor.java:290)
~[storm-client-2.3.0.y.jar:2.3.0.y]
... 6 more
{code}
The related metric is "__send-iconnection" from
https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/daemon/metrics/BuiltinMetricsUtil.java#L40-L43
Note that this can only be reproduced when metrics are sent across workers
(otherwise there is no serialization).
The work around is one of the following
1) add org.apache.storm.generated.NodeInfo to topology.kryo.register in
topology conf
2) set topology.fall.back.on.java.serialization true or unset
topology.fall.back.on.java.serialization since the default is true
The fix is to register NodeInfo class in kryo.
https://github.com/apache/storm/blob/7bef73a6faa14558ef254efe74cbe4bfef81c2e2/storm-client/src/jvm/org/apache/storm/serialization/SerializationFactory.java#L67-L77
--
This message was sent by Atlassian Jira
(v8.3.4#803005)