[jira] [Comment Edited] (STORM-2693) Topology submission or kill takes too much time when topologies grow to a few hundred

Jungtaek Lim (JIRA) Mon, 11 Sep 2017 05:18:21 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161132#comment-16161132
 ]


Jungtaek Lim edited comment on STORM-2693 at 9/11/17 12:17 PM:
---------------------------------------------------------------

[~danny0405]
Just to put some guides for contribution (sorry we didn't document it nicely):
- all the patches are ideal to be against with master (currently 2.0.0)
  - merger or any committers can decide which version line(s) the patch should 
be put in, and port back if necessary
- there's exceptional case: if patch addresses the bug which only resides on 
that version line(s)

Let's back to your design.

I haven't look into the detail, but I'm wondering how reading/storing from/to 
local disk will work with Nimbus H/A. Storm puts any critical data into 
zookeeper to ensure the data can be available at any time (when ZK is not 
available Storm will not work anyway...) This would be no longer true if we 
store it to only local disk.

You would want to think about the cases which each Nimbus has its own 
assignments (which is not in sync) stored to local (only one of them is leader 
though) and all of them are shutting down and restarting at the same time. 
Which Nimbus should be a leader? How we ensure the elected Nimbus has the 
latest assignments?

Btw, the idea brings me some thoughts: one of Storm's great point is stateless 
which can be achieved with stable storage (ZK for Storm), and we claims that 
Supervisor can work even Nimbus goes down. Now we introduced Nimbus H/A which 
makes Nimbus not a SPOF, and so some components like Supervisor may be able to 
rely on leader Nimbus instead of communicating with ZK if it is much faster and 
it doesn't put too much load to the leader Nimbus. (And this approach looks 
like same as your approach on your idea.)

And since we only allow leader Nimbus to handle assignments, while we still 
need to write assignments to ZK, we can 'cache' it within leader Nimbus (and 
update cache if there's any change on assignments: note that we should also 
write to ZK earlier though) and avoid reading it from ZK. 
We still need to get heartbeat (and statistics) from ZK, but it will be handled 
as similar idea or another approach (like Metrics V2 in point of metrics' view)


was (Author: kabhwan):
[~danny0405]
Just to put some guides for contribution (sorry we didn't document it nicely):
- all the patches are ideal to be against with master (currently 2.0.0)
  - merger or any committers can decide which version line(s) the patch should 
be put in, and port back if necessary
- there's exceptional case: if patch addresses the bug which only resides on 
that version line(s)

Let's back to your design.

I haven't look into the detail, but I'm wondering how reading/storing from/to 
local disk will work with Nimbus H/A. Storm puts any critical data into 
zookeeper to ensure the data can be available at any time (when ZK is not 
available Storm will not work so...) This would be no longer true if we store 
it to only local disk.

You would want to think about the cases which each Nimbus has its own 
assignments (which is not in sync) stored to local (only one of them is leader 
though) and all of them are shutting down and restarting at the same time. 
Which Nimbus should be a leader? How we ensure the elected Nimbus has the 
latest assignments?

Btw, the idea brings me some thoughts: one of Storm's great point is stateless 
which can be achieved with stable storage (ZK for Storm), and we claims that 
Supervisor can work even Nimbus goes down. Now we introduced Nimbus H/A which 
makes Nimbus not a SPOF, and so some components like Supervisor may be able to 
rely on leader Nimbus instead of communicating with ZK if it is much faster and 
it doesn't put too much load to the leader Nimbus. (And this approach looks 
like same as your approach on your idea.)

And since we only allow leader Nimbus to handle assignments, while we still 
need to write assignments to ZK, we can 'cache' it within leader Nimbus (and 
update cache if there's any change on assignments: note that we should also 
write to ZK earlier though) and avoid reading it from ZK. 
We still need to get heartbeat (and statistics) from ZK, but it will be handled 
as similar idea or another approach (like Metrics V2 in point of metrics' view)

> Topology submission or kill takes too much time when topologies grow to a few 
> hundred
> -------------------------------------------------------------------------------------
>
>                 Key: STORM-2693
>                 URL: https://issues.apache.org/jira/browse/STORM-2693
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-core
>    Affects Versions: 0.9.6, 1.0.2, 1.1.0, 1.0.3
>            Reporter: Yuzhao Chen
>         Attachments: 2FA30CD8-AF15-4352-992D-A67BD724E7FB.png, 
> D4A30D40-25D5-4ACF-9A96-252EBA9E6EF6.png
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Now for a storm cluster with 40 hosts [with 32 cores/128G memory] and 
> hundreds of topologies, nimbus submission and killing will take about minutes 
> to finish. For example, for a cluster with 300 hundred of topologies，it will 
> take about 8 minutes to submit a topology, this affect our efficiency 
> seriously.
> So, i check out the nimbus code and find two factor that will effect nimbus 
> submission/killing time for a scheduling round:
> * read existing-assignments from zookeeper for every topology [will take 
> about 4 seconds for a 300 topologies cluster]
> * read all the workers heartbeats and update the state to nimbus cache [will 
> take about 30 seconds for a 300 topologies cluster]
> the key here is that Storm now use zookeeper to collect heartbeats [not RPC], 
> and also keep physical plan [assignments] using zookeeper which can be 
> totally local in nimbus.
> So, i think we should make some changes to storm's heartbeats and assignments 
> management.
> For assignment promotion:
> 1. nimbus will put the assignments in local disk
> 2. when restart or HA leader trigger nimbus will recover assignments from zk 
> to local disk
> 3. nimbus will tell supervisor its assignment every time through RPC every 
> scheduling round
> 4. supervisor will sync assignments at fixed time
> For heartbeats promotion:
> 1. workers will report executors ok or wrong to supervisor at fixed time
> 2. supervisor will report workers heartbeats to nimbus at fixed time
> 3. if supervisor die, it will tell nimbus through runtime hook
>     or let nimbus find it through aware supervisor if is survive 
> 4. let supervisor decide if worker is running ok or invalid , supervisor will 
> tell nimbus which executors of every topology are ok



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (STORM-2693) Topology submission or kill takes too much time when topologies grow to a few hundred

Reply via email to