[ 
https://issues.apache.org/jira/browse/STORM-167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rick Kellogg updated STORM-167:
-------------------------------
    Component/s: storm-core

> proposal for storm topology online update
> -----------------------------------------
>
>                 Key: STORM-167
>                 URL: https://issues.apache.org/jira/browse/STORM-167
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: James Xu
>            Assignee: Parth Brahmbhatt
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/540
> Now update topology code can only be done by kill it and re-submit a new one. 
> During the kill and re-submit process some request may delay or fail. It is 
> not so good for online service. So we consider to add topology online update 
> recently.
> Mission
> update running topology code gracefully one worker after another without 
> service total interrupted. Just update topology code, not update topology DAG 
> structure including component, stream and task number.
> Proposal
> * client use "storm update topology-name new-jar-file" to submit new-jar-file 
> update request
> * nimbus update stormdist dir, link topology-dir to new one
> * nimbus update topology version on zk
> * the supervisors that running this topology update it
> ** check topology version on zk, if it is not the same as local version, a 
> topology update begin
> ** each supervisor schedule the topology's worker update at a 
> rand(expect-max-update-time) time point
> ** sync-supervisor download the latest code from nimbus
> ** sync-process check local worker heartbeat version(to be added), if it is 
> not the same with sync-supervisor downloaded version, kill the worker
> ** sync-process restart killed worker
> ** new worker heartbeat to zk with version(to be added), it can be displayed 
> on web ui to check update progress.
> This feature is deployed in our production clusters. It's really useful for 
> topologys handling online request waiting for response. Topology jar can be 
> updated without entire service offline.
> We hope that this feature is useful for others too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to