[ 
https://issues.apache.org/jira/browse/STORM-4077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pedro Azevedo updated STORM-4077:
---------------------------------
    Description: 
Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader 
(currently I have 3 for high availability) the workers get reassigned and this 
is a bad behaviour as every topology will have no workers running for a certain 
period(until new workers are assigned) due to a Nimbus leadership change.

Update:

Essentially, by using the modTime as the version, we have found that, while 
using the {{{}LocalFsBlobStoreFile{}}}, everytime the the Nimbus leader goes 
down the following occurs:
 # Nimbus (1) leader goes down and a new Nimbus (2) picks up the leadership.
 # If blobs in Nimbus (2) have a different modTime workers are restarted (even 
though they might be the same).
 # Nimbus (1) comes back up, syncs the blobs in the startup and updates the 
modTime, as it downloads the blobs again.
 # If Nimbus (2) leader goes down, all the workers will be restarted again as 
Nimbus (1) has new modTime again.
 # This can be repeated endless as the modTime will always be different in each 
Nimbus leader.

We suggest a new method that obtains the file version:
{code:java}
public abstract class BlobStoreFile {
    public abstract long getModTime() throws IOException;

    public long getVersion() throws IOException {
        return getModTime();
    }
}{code}
And defaults to the current approach if not implemented and the version of the 
file would be something in the lines:
{code:java}
public long getVersion() throws IOException {
    byte[] bytes = DigestUtils.sha1(new FileInputStream(path));
    return Arrays.hashCode(bytes);
} {code}
 

  was:
Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader 
(currently I have 3 for high availability) the workers get reassigned and this 
is a bad behaviour as every topology will have no workers running for a certain 
period(until new workers are assigned) due to a Nimbus leadership change.

 


> Worker being reassigned when Nimbus leadership changes
> ------------------------------------------------------
>
>                 Key: STORM-4077
>                 URL: https://issues.apache.org/jira/browse/STORM-4077
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 2.6.1
>            Reporter: Pedro Azevedo
>            Priority: Major
>
> Hey guys, I'm using Storm v2.6.1 and every time I restart the nimbus leader 
> (currently I have 3 for high availability) the workers get reassigned and 
> this is a bad behaviour as every topology will have no workers running for a 
> certain period(until new workers are assigned) due to a Nimbus leadership 
> change.
> Update:
> Essentially, by using the modTime as the version, we have found that, while 
> using the {{{}LocalFsBlobStoreFile{}}}, everytime the the Nimbus leader goes 
> down the following occurs:
>  # Nimbus (1) leader goes down and a new Nimbus (2) picks up the leadership.
>  # If blobs in Nimbus (2) have a different modTime workers are restarted 
> (even though they might be the same).
>  # Nimbus (1) comes back up, syncs the blobs in the startup and updates the 
> modTime, as it downloads the blobs again.
>  # If Nimbus (2) leader goes down, all the workers will be restarted again as 
> Nimbus (1) has new modTime again.
>  # This can be repeated endless as the modTime will always be different in 
> each Nimbus leader.
> We suggest a new method that obtains the file version:
> {code:java}
> public abstract class BlobStoreFile {
>     public abstract long getModTime() throws IOException;
>     public long getVersion() throws IOException {
>         return getModTime();
>     }
> }{code}
> And defaults to the current approach if not implemented and the version of 
> the file would be something in the lines:
> {code:java}
> public long getVersion() throws IOException {
>     byte[] bytes = DigestUtils.sha1(new FileInputStream(path));
>     return Arrays.hashCode(bytes);
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to