Which Storm version are you using? Is this happening outside human triggered deployments? There was a recent fix related with unwanted topology redeployments:
https://github.com/apache/storm/pull/3697 On Mon, 17 Feb 2025 at 09:22, silent5945 (via GitHub) <g...@apache.org> wrote: > > > silent5945 opened a new issue, #7966: > URL: https://github.com/apache/storm/issues/7966 > > We are having a systematic issue in a production cluster with topology > workers being intentionally killed by the supervisor while we want to > absolutely avoid that. From the nimbus/supervisor logs it is clear that the > topologies' worker are being restarted because of a topology blob update: > > ``` > 2025-02-12 20:37:13.354 o.a.s.d.s.Container [INFO] Killing > b573591b-fc08-4e6a-93e3-9857c6a64676-10.41.123.13:bfd5e3f0-2900-4d59-896e-77fa8e15b7a4 > 2025-02-12 20:37:23.362 o.a.s.d.s.Slot [INFO] STATE running msInState: > 93944773 topo:TOPO_A worker:bfd5e3f0-2900-4d59-896e-77fa8e15b7a4 -> > kill-blob-update msInState: 10001 topo:TOPO_A > worker:bfd5e3f0-2900-4d59-896e-77fa8e15b7a4 > 2025-02-12 20:37:33.682 o.a.s.d.s.Slot [INFO] STATE kill-blob-update > msInState: 20321 topo:TOPO_A worker:null -> waiting-for-blob-update > msInState: 1 > 2025-02-13 01:37:38.064 o.a.s.d.s.Container [INFO] Killing > b573591b-fc08-4e6a-93e3-9857c6a64676-10.41.123.13:e11d8695-82ef-4d75-a32e-fc1f5d4c3fff > 2025-02-13 01:37:48.068 o.a.s.d.s.Slot [INFO] STATE running msInState: > 322136 topo:TOPO_B worker:e11d8695-82ef-4d75-a32e-fc1f5d4c3fff -> > kill-blob-update msInState: 10001 topo:TOPO_B > worker:e11d8695-82ef-4d75-a32e-fc1f5d4c3fff > 2025-02-13 01:37:58.081 o.a.s.d.s.Slot [INFO] STATE kill-blob-update > msInState: 20014 topo:TOPO_B worker:null -> waiting-for-blob-update > msInState: 0 > 2025-02-13 03:38:03.503 o.a.s.d.s.Container [INFO] Killing > b573591b-fc08-4e6a-93e3-9857c6a64676-10.41.123.13:7b26a36d-c629-4d7a-bb62-e291766bad23 > 2025-02-13 03:38:13.506 o.a.s.d.s.Slot [INFO] STATE running msInState: > 349122 topo:TOPO_C worker:7b26a36d-c629-4d7a-bb62-e291766bad23 -> > kill-blob-update msInState: 10000 topo:TOPO_C > worker:7b26a36d-c629-4d7a-bb62-e291766bad23 > 2025-02-13 03:38:23.518 o.a.s.d.s.Slot [INFO] STATE kill-blob-update > msInState: 20012 topo:TOPO_C worker:null -> waiting-for-blob-update > msInState: 0 > 2025-02-13 07:38:26.391 o.a.s.d.s.Container [INFO] Killing > b573591b-fc08-4e6a-93e3-9857c6a64676-10.41.123.13:dec61a1e-3e6b-43eb-ba01-9778f1273fe8 > 2025-02-13 07:38:36.395 o.a.s.d.s.Slot [INFO] STATE running msInState: > 373132 topo:TOPO_D worker:dec61a1e-3e6b-43eb-ba01-9778f1273fe8 -> > kill-blob-update msInState: 10000 topo:TOPO_D > worker:dec61a1e-3e6b-43eb-ba01-9778f1273fe8 > 2025-02-13 07:38:46.409 o.a.s.d.s.Slot [INFO] STATE kill-blob-update > msInState: 20014 topo:TOPO_D worker:null -> waiting-for-blob-update > msInState: 0 > 2025-02-13 12:38:49.243 o.a.s.d.s.Container [INFO] Killing > b573591b-fc08-4e6a-93e3-9857c6a64676-10.41.123.13:6e835846-128e-45bc-82ad-a78895c20512 > 2025-02-13 12:38:59.246 o.a.s.d.s.Slot [INFO] STATE running msInState: > 394136 topo:TOPO_E worker:6e835846-128e-45bc-82ad-a78895c20512 -> > kill-blob-update msInState: 10000 topo:TOPO_E > worker:6e835846-128e-45bc-82ad-a78895c20512 > 2025-02-13 12:39:09.260 o.a.s.d.s.Slot [INFO] STATE kill-blob-update > msInState: 20014 topo:TOPO_E worker:null -> waiting-for-blob-update > msInState: 1 > ``` > > Indeed we see in nimbus log that the topology blobs are being updated > right before the killing: > > ``` > 2025-02-12 20:30:03.329 o.a.s.d.n.Nimbus [INFO] Downloading 10 entries > 2025-02-12 20:30:03.476 o.a.s.c.StormClusterStateImpl [INFO] set-path: > /blobstore/TOPO_A-stormjar.jar/nimbus-1:6627-2 > 2025-02-12 20:30:03.551 o.a.s.c.StormClusterStateImpl [INFO] set-path: > /blobstore/TOPO_A-stormconf.ser/nimbus-1:6627-2 > 2025-02-12 20:30:04.732 o.a.s.c.StormClusterStateImpl [INFO] set-path: > /blobstore/dep-b3e0e136-1c95-42e9-803e-676b4a8e972d.jar/nimbus-1:6627-3 > 2025-02-12 20:30:04.806 o.a.s.d.n.Nimbus [INFO] No more blobs to list for > session d2118791-893d-4f91-89c9-8b529a20782c > 2025-02-12 20:30:07.005 o.a.s.d.n.Nimbus [INFO] Downloading 10 entries > 2025-02-12 20:30:07.124 o.a.s.d.n.Nimbus [INFO] No more blobs to list for > session 4636c761-251a-428c-8ca0-adca6110d059 > ... > 2025-02-12 20:37:11.304 o.a.s.d.n.Nimbus [INFO] Created download session > 47588811-8920-450e-a2e7-aa70241bc650 for TOPO_A-stormconf.ser > 2025-02-12 20:37:11.337 o.a.s.d.n.Nimbus [INFO] Created download session > 2de876d7-d4b0-46c7-b2d1-52db2100c1a5 for TOPO_A-stormjar.jar > 2025-02-12 20:37:11.381 o.a.s.d.n.Nimbus [INFO] Created download session > 8baabff7-c8bc-49a7-93e8-35b576457d28 for > dep-b3e0e136-1c95-42e9-803e-676b4a8e972d.jar > ``` > But we are not interacting in any way with the blobstore, there is no map > defined or configured, we only submit the topology jar and that's it, nothing > in our system is doing the update, nor with the API or the filesystem. Couple > of things to note: > > - this is not caused by a Storm / Zookeeper failover or new leader election > - this is not caused by a topology rebalance > - this is not coming from Storm configuration > `supervisor.localizer.cache.target.size.mb = 10240`, the local storage seems > to be clean and stable, between 300m and 1GB > - there is no error in Zookeeper around those times > - it seems to be periodic (from the timestamps), the first time happening > after 1d2h for a permanent topology (TOPO_A), but also happening for the > other topologies which are short lived (10mins) > > So I have a couple of questions to try and understand what's happening > here and hopefully prevent the worker restarts: > > 1. Are there any mechanisms internal to Storm that will update a > topology's blobs? > 2. Can the blob update be caused by a `storm blobstore list` command? > 3. Is it possible to prevent blob updates entirely at topology level? > 4. Can the blob update be cause by a Zookeeper periodic cleanup somehow? > > Thank you very much in advance for any help. > > > -- > This is an automated message from the Apache Git Service. > To respond to the message, please log on to GitHub and use the > URL above to go to the specific comment. > > To unsubscribe, e-mail: dev-unsubscr...@storm.apache.org.apache.org > > For queries about this service, please contact Infrastructure at: > us...@infra.apache.org >