[
https://issues.apache.org/jira/browse/STORM-155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14069393#comment-14069393
]
ASF GitHub Bot commented on STORM-155:
--------------------------------------
Github user iwasakims commented on the pull request:
https://github.com/apache/incubator-storm/pull/194#issuecomment-49672593
The cause seems to be the bug in validate-launched-once. I filed
[STORM-415](https://issues.apache.org/jira/browse/STORM-415) and sent pull
request.
> Storm rebalancing code causes multiple topologies assigned to a single port
> ---------------------------------------------------------------------------
>
> Key: STORM-155
> URL: https://issues.apache.org/jira/browse/STORM-155
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Reporter: James Xu
>
> https://github.com/nathanmarz/storm/issues/551
> We're seeing an issue when rebalancing topologies on the clusters causes
> workers being assigned to multiple topologies which causes supervisors to
> fail. This can be easily reproduced locally by starting a single supervisor
> with 4 workers and nimbus, running several topologies and rebalancing all of
> them to use 1 worker.
> I tracked the issue to the mk-assignments function in nimbus.clj. In this
> function, the "existing-assignments" binding is assigned a list of all
> topologies with assignments except the one being rebalanced. The comment
> implies this is done to treat all the workers of the topology being
> rebalanced as unused, and that's what actually happens. However, the lack of
> the topology being rebalanced in the "existing-assignments" list causes this
> topology being ignore completely by scheduler and other code, so as the
> result all workers assigned to that topology will be taken over by other
> topologies, but
> no changes to the topology being rebalanced will be made, effective making
> all it's workers
> assigned to 2 topologies.
> I think that was not the intended behavior. I made a small change to treat
> all the workers of the topology being rebalanced dead instead, causing them
> to be reassigned fairly between all the topologies. This seems to work well
> and reliably for us.
> Let me know what do you think.
> --- a/src/clj/backtype/storm/daemon/nimbus.clj
> +++ b/src/clj/backtype/storm/daemon/nimbus.clj
> @@ -437,8 +437,11 @@
> (into {} (for [[tid assignment] existing-assignments
> :let [topology-details (.getById topologies tid)
> all-executors (topology->executors tid)
> + ;; for the topology which wants rebalance (specified
> by the scratch-topology-id)
> + ;; we consider all its execturors to be dead so they
> will be treated
> + ;; as free slots in the scheduler code.
> alive-executors (if (and scratch-topology-id (=
> scratch-topology-id tid))
> - all-executors
> + (set nil)
> (set (alive-executors nimbus
> topology-details all-executors assignment)))]]
> {tid alive-executors})))
> @@ -638,11 +641,7 @@
> ;; read all the assignments
> assigned-topology-ids (.assignments storm-cluster-state nil)
> existing-assignments (into {} (for [tid assigned-topology-ids]
> - ;; for the topology which wants
> rebalance (specified by the scratch-topology-id)
> - ;; we exclude its assignment,
> meaning that all the slots occupied by its assignment
> - ;; will be treated as free slot in
> the scheduler code.
> - (when (or (nil? scratch-topology-id)
> (not= tid scratch-topology-id))
> - {tid (.assignment-info
> storm-cluster-state tid nil)})))
> + {tid (.assignment-info
> storm-cluster-state tid nil)}))
> ;; make the new assignments for topologies
> topology->executor->node+port
> (compute-new-topology->executor->node+port
> nimbus
> ----------
> xumingming: The following code needs to be synchonized(mk-assignments):
> (defn do-rebalance [nimbus storm-id status]
> (.update-storm! (:storm-cluster-state nimbus)
> storm-id
> (assoc-non-nil
> {:component->executors (:executor-overrides status)}
> :num-workers
> (:num-workers status)))
> (mk-assignments nimbus :scratch-topology-id storm-id))
> otherwise it will cause race condition here:
> https://github.com/nathanmarz/storm/blob/master/storm-core/src/jvm/backtype/storm/scheduler/Cluster.java#L264
> then one port will be assigned to multiple topologies.
> ----------
> stass: That's not the issue here. I initially thought it was a race condition
> as well, but my problem with rebalancing was in the sequential part of the
> algorithm (as described in analysis). Needless to say, it fixed the issue for
> us and I have not seen any multiple assignments in months when running with
> the patch I submitted.
> ----------
> d2r: We are also hitting this issue and would welcome a fix.
> ----------
> nathanmarz: The problem with the proposed patch is that it doesn't treat the
> slots of the topology that's rebalancing as free. It will certainly consider
> its executors as dead, but it won't make use of its slots during
> re-scheduling.
> ----------
> stass: Hi, Nathan, thanks for comment.
> If my reading of the code is right, the workers of the topology being
> rebalanced will be treated as free since they will be marked as dead (that's
> the purpose of the top hunk of the diff), and the scheduling code only looks
> at free workers when assigning workers to topologies. That is also what I
> observed in practice with this patch running.
> Am I missing something?
> ----------
> vinceyang: @nathanmarz , We are also hitting this issue , I read the nimbus
> code,it seems when do rebalance , in mk-assignment function,the rebalancing
> topolgy's assignment info has out of date , but supervisor not kown this
> information. when other topolgy's assignment has the same port with the out
> of date assignment the problem occur. if we remove the out of date assignment
> in ZK this problem will not occur. if my Idea is OK,I will work on it to fix
> this issue.
> ----------
> revans2: We have hit this issue too, and so I have been looking into it. It
> seems that it can happen in two different situations.
> First a topology is not assigned anything after it previously had slots
> assigned to it.
> This happens most commonly when re-balancing because the scheduler is not
> aware the rebalanced topology had anything assigned to it previously, but I
> have been able to reproduce this with other hacked up schedulers.
> When this happens the supervisor in question will crash continuously until
> one of the topologies is killed.
> The fix seems to be that we should include assigned-topology-ids in
> topology->executor->node+port when missing but with the topology pointing to
> nil.
> Second the supervisor uses partially written scheduling data from ZK.
> (.set-assignment! storm-cluster-state topology-id assignment) is atomic for a
> single topology, but not for multiple topologies. This means that the
> supervisor can read data from ZK that has had some topologies updated, but
> not all of them.
> When this happens the supervisor will crash and then come back up and recover
> because the rest of the scheduling data was written to ZK.
> The fix for this seems to be that we need to "lock" zookeeper with a watch
> file during the update. The supervisors would not read the data until nimbus
> is done updating. I don't think this is as critical to fix because the
> supervisor recovers fairly quickly.
> Does my analysis seem correct? I don't understand all of the code perfectly,
> so I want to be sure I am not missing something.
--
This message was sent by Atlassian JIRA
(v6.2#6252)