[ 
https://issues.apache.org/jira/browse/STORM-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17526868#comment-17526868
 ] 

Radhika commented on STORM-3779:
--------------------------------

+1 to prioritize this bug fix

We recently migrated to v2.2.0 and this issue is completely messing up the 
storm topology process which is affecting Production runs. 

Also suggest if there are any workarounds, clean-up scripts that needs to be 
executed until the bug is fixed.

> killed topology worker does not removed with warn and error that "Topology 
> config is not localized yet..."
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: STORM-3779
>                 URL: https://issues.apache.org/jira/browse/STORM-3779
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.2.0
>            Reporter: sanghee park
>            Priority: Major
>
> Hi developers,
> We met critical issue when kill storm topology.
>  
> We killed the topology as below.
> {code:java}
> Config conf = new Config();
> conf.put(Config.NIMBUS_SEEDS, "SOME_NIMBUS_SEED_STRING");
>  
> KillOptions opt = new KillOptions();
> opt.set_wait_secs_isSet(true);
> opt.set_wait_secs(10);
>  
> Nimbus.Iface nimbusClient = 
> NimbusClient.getConfiguredClient(conf).getClient();
> nimbusClient.killTopologyWithOpts("TOPOLOGY_NAME", opt);
> {code}
>  
> Topology workers were distributed across multiple supervisors.
>  Some supervisor's workers died normally.
>  
> But the problem is that,
> h3. *Some supervisor workers never died with error message like below!!*
>  
> {noformat}
> 2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] SET worker-user 
> baef41a4-b5f6-4ea3-8868-5537dfba82f8 root
> 2021-06-29 02:58:44.284 o.a.s.d.s.Container SLOT_6707 [INFO] Creating 
> symlinks for worker-id: baef41a4-b5f6-4ea3-8868-5537dfba82f8 storm-id: 
> TOPOLOGY_NAME for files(1): [resources]
> 2021-06-29 02:58:44.284 o.a.s.d.s.BasicContainer SLOT_6707 [INFO] Launching 
> worker with assignment LocalAssignment(topology_id:TOPOLOGY_NAME, 
> executors:[ExecutorInfo(task_start:17, task_end:17), 
> ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, 
> task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, 
> cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
> resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, 
> cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) for this 
> supervisor d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14 on port 6707 
> with id baef41a4-b5f6-4ea3-8868-5537dfba82f8
> 2021-06-29 02:58:44.285 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE 
> kill-and-relaunch msInState: 6 topo:TOPOLOGY_NAME 
> worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> waiting-for-worker-start 
> msInState: 0 topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
> 2021-06-29 02:58:44.286 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE 
> kill-and-relaunch msInState: 7 topo:TOPOLOGY_NAME 
> worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> waiting-for-worker-start 
> msInState: 0 topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
> 2021-06-29 02:58:46.799 o.a.s.d.s.BasicContainer Thread-7269 [INFO] Worker 
> Process d06bb5c5-25e2-4557-8996-4d40045022d1 exited with code: 254
> 2021-06-29 02:58:48.065 o.a.s.d.s.BasicContainer Thread-7270 [INFO] Worker 
> Process baef41a4-b5f6-4ea3-8868-5537dfba82f8 exited with code: 254
> 2021-06-29 02:59:09.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] 
> Running supervisor healthchecks...
> 2021-06-29 02:59:09.234 o.a.s.h.HealthChecker timer [INFO] The supervisor 
> healthchecks succeeded.
> 2021-06-29 02:59:39.234 o.a.s.d.s.t.SupervisorHealthCheck timer [INFO] 
> Running supervisor healthchecks...
> 2021-06-29 02:59:39.234 o.a.s.h.HealthChecker timer [INFO] The supervisor 
> healthchecks succeeded.
> 2021-06-29 02:59:53.558 o.a.s.d.s.Supervisor pool-11-thread-9 [INFO] Got an 
> assignments from master, will start to sync with assignments: 
> SupervisorAssignments(...)
> 2021-06-29 02:59:53.936 o.a.s.d.s.Slot SLOT_6702 [INFO] SLOT 6702: Assignment 
> Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
> executors:[ExecutorInfo(task_start:23, task_end:23), 
> ExecutorInfo(task_start:11, task_end:11)], 
> resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, 
> shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
> resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, 
> cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
> 2021-06-29 02:59:53.939 o.a.s.d.s.Container SLOT_6702 [INFO] Killing 
> d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:25976cac-9170-44ec-b835-099377cda893
> 2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708: Assignment 
> Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
> executors:[ExecutorInfo(task_start:10, task_end:10), 
> ExecutorInfo(task_start:22, task_end:22)], 
> resources:WorkerResources(mem_on_heap:3200.0, mem_off_heap:0.0, cpu:20.0, 
> shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
> resources:{offheap.memory.mb=0.0, onheap.memory.mb=3200.0, 
> cpu.pcore.percent=20.0}, shared_resources:{}), owner:root) to null
> 2021-06-29 02:59:54.293 o.a.s.d.s.Slot SLOT_6707 [INFO] SLOT 6707: Assignment 
> Changed from LocalAssignment(topology_id:TOPOLOGY_NAME, 
> executors:[ExecutorInfo(task_start:17, task_end:17), 
> ExecutorInfo(task_start:29, task_end:29), ExecutorInfo(task_start:5, 
> task_end:5)], resources:WorkerResources(mem_on_heap:6272.0, mem_off_heap:0.0, 
> cpu:30.0, shared_mem_on_heap:0.0, shared_mem_off_heap:0.0, 
> resources:{offheap.memory.mb=0.0, onheap.memory.mb=6272.0, 
> cpu.pcore.percent=30.0}, shared_resources:{}), owner:root) to null
> 2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6708 [INFO] STATE 
> waiting-for-worker-start msInState: 70011 topo:TOPOLOGY_NAME 
> worker:d06bb5c5-25e2-4557-8996-4d40045022d1 -> kill msInState: 0 
> topo:TOPOLOGY_NAME worker:d06bb5c5-25e2-4557-8996-4d40045022d1
> 2021-06-29 02:59:54.296 o.a.s.d.s.Slot SLOT_6707 [INFO] STATE 
> waiting-for-worker-start msInState: 70010 topo:TOPOLOGY_NAME 
> worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8 -> kill msInState: 0 
> topo:TOPOLOGY_NAME worker:baef41a4-b5f6-4ea3-8868-5537dfba82f8
> 2021-06-29 02:59:54.298 o.a.s.d.s.Slot SLOT_6708 [INFO] SLOT 6708 all 
> processes are dead...
> 2021-06-29 02:59:54.298 o.a.s.d.s.Container SLOT_6708 [INFO] Cleaning up 
> d2ee514a-e40e-40fb-b119-59763f3bb95d-10.233.112.14:d06bb5c5-25e2-4557-8996-4d40045022d1
> 2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting 
> path /storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/pids/141225
> 2021-06-29 02:59:54.298 o.a.s.d.s.AdvancedFSOps SLOT_6708 [INFO] Deleting 
> path /storm/workers/d06bb5c5-25e2-4557-8996-4d40045022d1/heartbeats
> 2021-06-29 03:00:06.452 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar
> 2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormjar.jar.version
> 2021-06-29 03:00:06.472 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME/resources
> 2021-06-29 03:00:06.472 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
> Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormjar.jar (REMOVED 
> FROM CLUSTER).
> 2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser
> 2021-06-29 03:00:06.475 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormconf.ser.version
> 2021-06-29 03:00:06.475 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
> Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormconf.ser (REMOVED 
> FROM CLUSTER).
> 2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser
> 2021-06-29 03:00:06.477 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path 
> /storm/supervisor/stormdist/TOPOLOGY_NAME/stormcode.ser.version
> 2021-06-29 03:00:06.478 o.a.s.l.LocalizedResourceRetentionSet AsyncLocalizer 
> Task Executor - 1 [INFO] Deleted blob: TOPOLOGY_NAME-stormcode.ser (REMOVED 
> FROM CLUSTER).
> 2021-06-29 03:00:06.478 o.a.s.d.s.AdvancedFSOps AsyncLocalizer Task Executor 
> - 1 [INFO] Deleting path /storm/supervisor/stormdist/TOPOLOGY_NAME
> 2021-06-29 03:00:07.062 o.a.s.d.s.Supervisor pool-11-thread-10 [WARN] 
> Topology config is not localized yet...
> 2021-06-29 03:00:07.063 o.a.s.t.ProcessFunction pool-11-thread-10 [ERROR] 
> Internal error processing sendSupervisorWorkerHeartbeat
> org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not 
> appear to be alive, you should probably exit
>         at 
> org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
>  ~[storm-server-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
>  [storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
>  [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> [?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> [?:?]
>         at java.lang.Thread.run(Unknown Source) [?:?]
> 2021-06-29 03:00:07.064 o.a.s.t.ProcessFunction pool-11-thread-3 [ERROR] 
> Internal error processing sendSupervisorWorkerHeartbeat
> org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not 
> appear to be alive, you should probably exit
>         at 
> org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
>  ~[storm-server-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
>  [storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
>  [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> [?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> [?:?]
>         at java.lang.Thread.run(Unknown Source) [?:?]
> 2021-06-29 03:00:08.106 o.a.s.d.s.Supervisor pool-11-thread-9 [WARN] Topology 
> config is not localized yet...
> 2021-06-29 03:00:08.107 o.a.s.t.ProcessFunction pool-11-thread-9 [ERROR] 
> Internal error processing sendSupervisorWorkerHeartbeat
> org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not 
> appear to be alive, you should probably exit
>         at 
> org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
>  ~[storm-server-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
>  [storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
>  [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> [?:?]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
> [?:?]
>         at java.lang.Thread.run(Unknown Source) [?:?]
> 2021-06-29 03:00:08.108 o.a.s.d.s.Supervisor pool-11-thread-16 [WARN] 
> Topology config is not localized yet...
> 2021-06-29 03:00:08.108 o.a.s.t.ProcessFunction pool-11-thread-16 [ERROR] 
> Internal error processing sendSupervisorWorkerHeartbeat
> org.apache.storm.utils.WrappedNotAliveException: TOPOLOGY_NAME does not 
> appear to be alive, you should probably exit
>         at 
> org.apache.storm.daemon.supervisor.Supervisor$1.sendSupervisorWorkerHeartbeat(Supervisor.java:448)
>  ~[storm-server-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:374)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.generated.Supervisor$Processor$sendSupervisorWorkerHeartbeat.getResult(Supervisor.java:353)
>  ~[storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:38) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:172)
>  [storm-client-2.2.0.jar:2.2.0]
>         at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:524)
>  [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> [storm-shaded-deps-2.2.0.jar:2.2.0]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
> [?:?]{noformat}
> *This error message repeated forever until we killed that worker process.*
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to