[
https://issues.apache.org/jira/browse/STORM-3501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diogo Monteiro updated STORM-3501:
----------------------------------
Description:
I was trying to launch a topology that I'm developing (in 2.0.0) and noticed
that the worker was getting restarted each ~30 seconds.
I placed a breakpoint in the _kill_ method of _LocalContainer_
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
to try and understand why the worker was getting restarted.
The call stack was:
{{kill:66, LocalContainer
(org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot
(org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot
(org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot
(org.apache.storm.daemon.supervisor) }}{{run:931, Slot
(org.apache.storm.daemon.supervisor) }}
With this I can understand that the worker is killed because a blob has
changed
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
In fact, there's a changing blob in the _dynamicState_ at that point.
I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and
notifies the Slot state machine of a changing blob.
I noticed this:
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
Which tell me that (correct me if I'm wrong):
* Supervisor tries to update blobs each 30 seconds.
* The topology jar blob requires extraction of the resources directory (either
from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_
and it's existence is checked in _isFullyDownloaded_.
* The Slot is notified of a changing blob if:
* the remote version is different from the local version (the code has
changed).
* OR the blob is not fully downloaded (the jar exists, and the extracted
resources directory exists).
Well, I did not have a resources folder under the root of the classpath, and
that's why the worker was being restarted each ~30 seconds, as the Slot was
being notified of a changing blob everytime _updateBlobs_ ran.
I created a resources folder (with dummy files) under the root of the
classpath and the problem is now solved.
However, if I understand correctly, the resources folder is only required for
_multilang_. Our topologies do not use _multilang_ and this do not happen in
Storm 1.1.3 for instance.
Happy to submit MR.
was:
I was trying to launch a topology that I'm developing (in 2.0.0) and noticed
that the worker was getting restarted each ~30 seconds.
I placed a breakpoint in the _kill_ method of _LocalContainer_
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
to try and understand why the worker was getting restarted.
The call stack was:
{{kill:66, LocalContainer
(org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot
(org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot
(org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot
(org.apache.storm.daemon.supervisor) }}{{run:931, Slot
(org.apache.storm.daemon.supervisor) }}
With this I can understand that the worker is killed because a blob has
changed
([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
In fact, there's a changing blob in the _dynamicState_ at that point.
I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and
notifies the Slot state machine of a changing blob.
I noticed this:
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
*
[https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
Which tell me that (correct me if I'm wrong):
* Supervisor tries to update blobs each 30 seconds.
* The topology jar blob requires extraction of the resources directory (either
from a jar or directly in a classpath URL). It does so in _fetchUnzipToTemp_
and it's existence is checked in _isFullyDownloaded_.
* The Slot is notified of a changing blob if:
* the remote version is different from the local version (the code has
changed).
* OR the blob is not fully downloaded (the jar exists, and the extracted
resources directory exists).
Well, I did not have a resources folder under the root of the classpath, and
that's why the worker was being restarted each ~30 seconds, as the Slot was
being notified of a changing blob everytime _updateBlobs_ ran.
I created a resources folder (with dummy files) under the root of the
classpath and the problem is now solved.
However, if I understand correctly, the resources folder is only required for
_multilang_. Our topologies do not use _multilang_ and this do not happen in
Storm 1.1.3 for instance.
Happy to submit MR.
> Local Cluster worker restarts
> -----------------------------
>
> Key: STORM-3501
> URL: https://issues.apache.org/jira/browse/STORM-3501
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-server
> Affects Versions: 2.0.0, 2.1.0
> Environment: Linux
> Reporter: Diogo Monteiro
> Priority: Minor
>
> I was trying to launch a topology that I'm developing (in 2.0.0) and noticed
> that the worker was getting restarted each ~30 seconds.
> I placed a breakpoint in the _kill_ method of _LocalContainer_
> ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/LocalContainer.java#L66])
> to try and understand why the worker was getting restarted.
>
> The call stack was:
> {{kill:66, LocalContainer
> (org.apache.storm.daemon.supervisor)}}{{killContainerFor:269, Slot
> (org.apache.storm.daemon.supervisor) }}\{{handleRunning:724, Slot
> (org.apache.storm.daemon.supervisor) }}\{{stateMachineStep:218, Slot
> (org.apache.storm.daemon.supervisor) }}{{run:931, Slot
> (org.apache.storm.daemon.supervisor) }}
>
> With this I can understand that the worker is killed because a blob has
> changed
> ([https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/daemon/supervisor/Slot.java#L724]).
> In fact, there's a changing blob in the _dynamicState_ at that point.
>
> I checked the _AsyncLocalizer_ which downloads, caches blobs locally, and
> notifies the Slot state machine of a changing blob.
>
> I noticed this:
> *
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L339]
> *
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/AsyncLocalizer.java#L265]
> *
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L142]
> *
> [https://github.com/apache/storm/blob/2ba95bbd1c911d4fc6363b1c4b9c4c6d86ac9aae/storm-server/src/main/java/org/apache/storm/localizer/LocallyCachedTopologyBlob.java#L192]
>
> Which tell me that (correct me if I'm wrong):
> * Supervisor tries to update blobs each 30 seconds.
> * The topology jar blob requires extraction of the resources directory
> (either from a jar or directly in a classpath URL). It does so in
> _fetchUnzipToTemp_ and it's existence is checked in _isFullyDownloaded_.
> * The Slot is notified of a changing blob if:
> * the remote version is different from the local version (the code has
> changed).
> * OR the blob is not fully downloaded (the jar exists, and the extracted
> resources directory exists).
>
> Well, I did not have a resources folder under the root of the classpath, and
> that's why the worker was being restarted each ~30 seconds, as the Slot was
> being notified of a changing blob everytime _updateBlobs_ ran.
> I created a resources folder (with dummy files) under the root of the
> classpath and the problem is now solved.
>
> However, if I understand correctly, the resources folder is only required
> for _multilang_. Our topologies do not use _multilang_ and this do not happen
> in Storm 1.1.3 for instance.
>
> Happy to submit MR.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)