ASF GitHub Bot updated STORM-3024:
    Labels: pull-request-available  (was: )

> Allow scheduling for RAS to happen in the background
> ----------------------------------------------------
>                 Key: STORM-3024
>                 URL: https://issues.apache.org/jira/browse/STORM-3024
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-server
>    Affects Versions: 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Major
>              Labels: pull-request-available
> We have run into some issues recently where occasionally a strategy on a very 
> large cluster will take an extra long amount of time finish scheduling.  This 
> slowness cascades into other issues, like topologies not being able to be 
> killed because the timer thread is still in use trying to run scheduling.
> The plan is to make scheduling happen in a thread pool.  The main thread will 
> wait for up to a configurable amount of time for the topology to be 
> scheduled, but if it does not complete in that time it will be left to keep 
> running in the background thread in hopes that later on it will be scheduled.
> If for some reason the state of the cluster changes while scheduling is 
> happening in the background we will cancel the scheduling, as any scheduling 
> it produced may not be able to fit on the cluster.  The next time the 
> scheduler runs it will restart the scheduling and hopefully allow the cluster 
> to reach a steady state even if it takes a while, but without blocking kills 
> and other critical operations from happening.
> Note that we are also working on optimizing scheduling as well so that these 
> issues don't happen in the first place.

This message was sent by Atlassian JIRA

Reply via email to