[
https://issues.apache.org/jira/browse/STORM-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886341#comment-17886341
]
Pedro Azevedo commented on STORM-4077:
--------------------------------------
PR added -> https://github.com/apache/storm/pull/3697
> Worker being reassigned when Nimbus leadership changes
> ------------------------------------------------------
>
> Key: STORM-4077
> URL: https://issues.apache.org/jira/browse/STORM-4077
> Project: Apache Storm
> Issue Type: New Feature
> Affects Versions: 2.6.1
> Reporter: Pedro Azevedo
> Priority: Major
>
> Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader
> (currently I have 3 for high availability) the workers get reassigned and
> this is a bad behaviour as every topology will have no workers running for a
> certain period(until new workers are assigned) due to a Nimbus leadership
> change.
> Update:
> Essentially, by using the modTime as the version, we have found that, while
> using theĀ {{{}LocalFsBlobStoreFile{}}}, everytime the the Nimbus leader goes
> down the following occurs:
> # Nimbus (1) leader goes down and a new Nimbus (2) picks up the leadership.
> # If blobs in Nimbus (2) have a different modTime workers are restarted
> (even though they might be the same).
> # Nimbus (1) comes back up, syncs the blobs in the startup and updates the
> modTime, as it downloads the blobs again.
> # If Nimbus (2) leader goes down, all the workers will be restarted again as
> Nimbus (1) has new modTime again.
> # This can be repeated endless as the modTime will always be different in
> each Nimbus leader.
> We suggest a new method that obtains the file version:
> {code:java}
> public abstract class BlobStoreFile {
> public abstract long getModTime() throws IOException;
> public long getVersion() throws IOException {
> return getModTime();
> }
> }{code}
> And defaults to the current approach if not implemented and the version of
> the file would be something in the lines:
> {code:java}
> public long getVersion() throws IOException {
> byte[] bytes = DigestUtils.sha1(new FileInputStream(path));
> return Arrays.hashCode(bytes);
> } {code}
> Soon, I'll open the PR and link it here.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)