sanghee park created STORM-3779:
-----------------------------------

             Summary: killed topology worker does not removed with warn and 
error that "Topology config is not localized yet..."
                 Key: STORM-3779
                 URL: https://issues.apache.org/jira/browse/STORM-3779
             Project: Apache Storm
          Issue Type: Bug
    Affects Versions: 2.2.0
            Reporter: sanghee park


Hi developers,

We met critical issue when kill storm topology.

 

We killed the topology as below.
{code:java}
Config conf = new Config();
conf.put(Config.NIMBUS_SEEDS, "SOME_NIMBUS_SEED_STRING");
 
KillOptions opt = new KillOptions();
opt.set_wait_secs_isSet(true);
opt.set_wait_secs(10);
 
Nimbus.Iface nimbusClient = NimbusClient.getConfiguredClient(conf).getClient();
nimbusClient.killTopologyWithOpts("TOPOLOGY_NAME", opt);
{code}
 

 

 

Topology workers were distributed across multiple supervisors.
Some supervisor's workers died normally.

 

But the problem is that,
h3. *Some supervisor workers never died with error message like below!!*

 
{noformat}
2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] SET worker-user 
baef41a4-b5f6-4ea3-8868-5537dfba82f8 root
2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] Creating symlinks 
for worker-id: baef41a4-b5f6-4ea3-8868-5537dfba82f8 storm-id: TOPOLOGY_NAME for 
files(1): [resources]
2021-06-29 02:58:44.284 o.a.s.d.s.BasicContainer SLOT_6707 [INFO] Launching 
worker with assignment LocalAssignment(topology_id:TOPOLOGY_NAME, 
executors:[ExecutorInfo(task_start:17, task_end:17), 
ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, 
task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, 
cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, 
cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) for this supervisor 
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14 on port 6707 with id 
baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:44.285 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE kill-and-relaunch 
msInState: 6 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> 
waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME 
worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:58:44.286 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE kill-and-relaunch 
msInState: 7 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> 
waiting-for-worker-start msInState: 0 topo:TOPOLOGY_NAME 
worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:58:46.799 o.a.s.d.s.BasicContainer Thread-7269 [INFO] Worker 
Process d06bb5c5-25e2-4557-8996-4d40045022d1 exited with code: 254
2021-06-29 02:58:48.065 o.a.s.d.s.BasicContainer Thread-7270 [INFO] Worker 
Process baef41a4-b5f6-4ea3-8868-5537dfba82f8 exited with code: 254
2021-06-29 02:59:09.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running 
supervisor healthchecks...
2021-06-29 02:59:09.234 o.a.s.h.HealthChecker timer [INFO] The supervisor 
healthchecks succeeded.
2021-06-29 02:59:39.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] Running 
supervisor healthchecks...
2021-06-29 02:59:39.234 o.a.s.h.HealthChecker timer [INFO] The supervisor 
healthchecks succeeded.
2021-06-29 02:59:53.558 o.a.s.d.s.Supervisor pool-11-thread-9 [INFO] Got an 
assignments from master, will start to sync with assignments: 
SupervisorAssignments(...)
2021-06-29 02:59:53.936 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Assignment 
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
executors:[ExecutorInfo(task_start:23, task_end:23), 
ExecutorInfo(task_start:11, task_end:11)], 
resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, 
shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, 
cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:53.939 o.a.s.d.s.Container SLOT_6702 [INFO] Killing 
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:25976cac-9170-44ec-b835-099377cda893
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708: Assignment 
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
executors:[ExecutorInfo(task_start:10, task_end:10), 
ExecutorInfo(task_start:22, task_end:22)], 
resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, 
shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, 
cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6707 [INFO] SLOT 6707: Assignment 
Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
executors:[ExecutorInfo(task_start:17, task_end:17), 
ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, 
task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, 
cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, 
cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) to null
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE 
waiting-for-worker-start msInState: 70011 topo:TOPOLOGY_NAME 
worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> kill msInState: 0 
topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE 
waiting-for-worker-start msInState: 70010 topo:TOPOLOGY_NAME 
worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> kill msInState: 0 
topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
2021-06-29 02:59:54.298 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708 all processes 
are dead...
2021-06-29 02:59:54.298 o.a.s.d.s.Container SLOT_6708 [INFO] Cleaning up 
d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:d06bb5c5-25e2-4557-8996-4d40045022d1
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path 
/storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/pids/141225
2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting path 
/storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/heartbeats
2021-06-29 03:00:06.452 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path 
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar.version
2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/resources
2021-06-29 03:00:06.472 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormjar.jar (REMOVED FROM 
CLUSTER).
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser
2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path 
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser.version
2021-06-29 03:00:06.475 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormconf.ser (REMOVED 
FROM CLUSTER).
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser
2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path 
/storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser.version
2021-06-29 03:00:06.478 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormcode.ser (REMOVED 
FROM CLUSTER).
2021-06-29 03:00:06.478 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor - 
1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME
2021-06-29 03:00:07.062 o.a.s.d.s.Supervisor pool-11-thread-10 [WARN] Topology 
config is not localized yet...
2021-06-29 03:00:07.063 o.a.s.t.ProcessFunction pool-11-thread-10 [ERROR] 
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear 
to be alive, you should probably exit
        at 
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
 ~[storm-server-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
 [storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
 [storm-shaded-deps-2.2.0.jar:2.2.0]
        at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:07.064 o.a.s.t.ProcessFunction pool-11-thread-3 [ERROR] 
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear 
to be alive, you should probably exit
        at 
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
 ~[storm-server-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
 [storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
 [storm-shaded-deps-2.2.0.jar:2.2.0]
        at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.106 o.a.s.d.s.Supervisor pool-11-thread-9 [WARN] Topology 
config is not localized yet...
2021-06-29 03:00:08.107 o.a.s.t.ProcessFunction pool-11-thread-9 [ERROR] 
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear 
to be alive, you should probably exit
        at 
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
 ~[storm-server-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
 [storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
 [storm-shaded-deps-2.2.0.jar:2.2.0]
        at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
[?:?]
        at java.lang.Thread.run(Unknown Source) [?:?]
2021-06-29 03:00:08.108 o.a.s.d.s.Supervisor pool-11-thread-16 [WARN] Topology 
config is not localized yet...
2021-06-29 03:00:08.108 o.a.s.t.ProcessFunction pool-11-thread-16 [ERROR] 
Internal error processing sendSupervisorWorkerHeartbeat
org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not appear 
to be alive, you should probably exit
        at 
org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
 ~[storm-server-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
 ~[storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at 
org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
 [storm-client-2.2.0.jar:2.2.0]
        at 
org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
 [storm-shaded-deps-2.2.0.jar:2.2.0]
        at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
[storm-shaded-deps-2.2.0.jar:2.2.0]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
[?:?]{noformat}
*This error message repeated forever until we killed that worker process.*

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to