sanghee park created STORM-3779:
-----------------------------------
Summary: killed topology worker does not removed with warn and
error that "Topology config is not localized yet..."
Key: STORM-3779
URL: https://issues.apache.org/jira/browse/STORM-3779
Project: Apache Storm
Issue Type: Bug
Affects Versions: 2.2.0
Reporter: sanghee park
Hi developers,
We met critical issue when kill storm topology.
We killed the topology as below.
{code:java}
Config conf = new Config();
conf.put(Config.NIMBUS_SEEDS, "SOME_NIMBUS_SEED_STRING");
KillOptions opt = new KillOptions();
opt.set_wait_secs_isSet(true);
opt.set_wait_secs(10);
Nimbus.Iface nimbusClient = NimbusClient.getConfiguredClient(conf).getClient();
nimbusClient.killTopologyWithOpts("TOPOLOGY_NAME", opt);
{code}
Topology workers were distributed across multiple supervisors.
Some supervisor's workers died normally.
But the problem is that,
h3. *Some supervisor workers never died with error message like below!!*
{noformat}
2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] SET worker-user
baef41a4-b5f6-4ea3-8868-5537dfba82f8 root
2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] Creating symlinks
for worker-id: baef41a4-b5f6-4ea3-8868-5537dfba82f8 storm-id: TOPOLOGY_NAME for
files(1): [resources]
2021-06-29 02:58:44.284 o.a.s.d.s.BasicContainer SLOT_6707 [INFO] Launching
worker with assignment LocalAssignment(topology_id:TOPOLOGY_NAME,
executors:[ExecutorInfo(task_start:17, task_end:17),
ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5,
task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0,
cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0,
resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0,
cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) for this supervisor
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14 on port 6707 with id
baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:44.285 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE kill-and-relaunch
msInState: 6 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1 ->
waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME
worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:58:44.286 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE kill-and-relaunch
msInState: 7 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 ->
waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME
worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:46.799 o.a.s.d.s.BasicContainer Thread-7269 [INFO] Worker
Process d06bb5c5-25e2-4557-8996-4d40045022d1 exited with code: 254
2021-06-29 02:58:48.065 o.a.s.d.s.BasicContainer Thread-7270 [INFO] Worker
Process baef41a4-b5f6-4ea3-8868-5537dfba82f8 exited with code: 254
2021-06-29 02:59:09.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running
supervisor healthchecks...
2021-06-29 02:59:09.234 o.a.s.h.HealthChecker timer [INFO] The supervisor
healthchecks succeeded.
2021-06-29 02:59:39.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running
supervisor healthchecks...
2021-06-29 02:59:39.234 o.a.s.h.HealthChecker timer [INFO] The supervisor
healthchecks succeeded.
2021-06-29 02:59:53.558 o.a.s.d.s.Supervisor pool-11-thread-9 [INFO] Got an
assignments from master, will start to sync with assignments:
SupervisorAssignments(...)
2021-06-29 02:59:53.936 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Assignment
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME,
executors:[ExecutorInfo(task_start:23, task_end:23),
ExecutorInfo(task_start:11, task_end:11)],
resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0,
shared_mem_on_heap:0.0, shared_mem_off_heap:0.0,
resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0,
cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:53.939 o.a.s.d.s.Container SLOT_6702 [INFO] Killing
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:25976cac-9170-44ec-b835-099377cda893
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708: Assignment
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME,
executors:[ExecutorInfo(task_start:10, task_end:10),
ExecutorInfo(task_start:22, task_end:22)],
resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0,
shared_mem_on_heap:0.0, shared_mem_off_heap:0.0,
resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0,
cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6707 [INFO] SLOT 6707: Assignment
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME,
executors:[ExecutorInfo(task_start:17, task_end:17),
ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5,
task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0,
cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0,
resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0,
cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE
waiting-for-worker-start msInState: 70011 topo:TOPOLOGY_NAME
worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> kill msInState: 0
topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE
waiting-for-worker-start msInState: 70010 topo:TOPOLOGY_NAME
worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> kill msInState: 0
topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:59:54.298 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708 all processes
are dead...
2021-06-29 02:59:54.298 o.a.s.d.s.Container SLOT_6708 [INFO] Cleaning up
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path
/storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/pids/141225
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path
/storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/heartbeats
2021-06-29 03:00:06.452 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar.version
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/resources
2021-06-29 03:00:06.472 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormjar.jar (REMOVED FROM
CLUSTER).
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser.version
2021-06-29 03:00:06.475 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormconf.ser (REMOVED
FROM CLUSTER).
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser.version
2021-06-29 03:00:06.478 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormcode.ser (REMOVED
FROM CLUSTER).
2021-06-29 03:00:06.478 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor -
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME
2021-06-29 03:00:07.062 o.a.s.d.s.Supervisor pool-11-thread-10 [WARN] Topology
config is not localized yet...
2021-06-29 03:00:07.063 o.a.s.t.ProcessFunction pool-11-thread-10 [ERROR]
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear
to be alive, you should probably exit
at
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
~[storm-server-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:07.064 o.a.s.t.ProcessFunction pool-11-thread-3 [ERROR]
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear
to be alive, you should probably exit
at
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
~[storm-server-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.106 o.a.s.d.s.Supervisor pool-11-thread-9 [WARN] Topology
config is not localized yet...
2021-06-29 03:00:08.107 o.a.s.t.ProcessFunction pool-11-thread-9 [ERROR]
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear
to be alive, you should probably exit
at
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
~[storm-server-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
[?:?]
at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.108 o.a.s.d.s.Supervisor pool-11-thread-16 [WARN] Topology
config is not localized yet...
2021-06-29 03:00:08.108 o.a.s.t.ProcessFunction pool-11-thread-16 [ERROR]
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear
to be alive, you should probably exit
at
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
~[storm-server-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
~[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
[storm-client-2.2.0.jar:2.2.0]
at
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18)
[storm-shaded-deps-2.2.0.jar:2.2.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
[?:?]{noformat}
*This error message repeated forever until we killed that worker process.*
--
This message was sent by Atlassian Jira
(v8.3.4#803005)