Koushik Das created CLOUDSTACK-6124:
---------------------------------------
Summary: During MS maintenance unfinished work items are not
cleaned up resulting in them getting repeated for every subsequent maintenance
Key: CLOUDSTACK-6124
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6124
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Components: Management Server
Affects Versions: 4.3.0
Reporter: Koushik Das
Assignee: Koushik Das
Fix For: 4.4.0
During MS shutdown, all pending work items (op_it_work.step != 'Done') for it
are picked up by other MS in cluster. The new MS then try to see for all
pending work items, if the VMs are running or not and if not try to start them
(using the same mechanism used to HA VMs). In case the investigators find out
that VMs are still alive no action is needed. This completes the process for
checking all pending work items.
Looks like there is a bug in the code where the op_it_work.step is not marked
as 'Done' in the above case thereby leaving the work items as pending always.
As a result every time MS owning these work items is shutdown, the work items
are picked up by another MS and the steps mentioned above gets repeated.
Scenario where a pending work item may get created. If there is a failure to
deploy VM then type and step gets set to 'Starting' and 'Release' respectively.
Ideally if the operations ends gracefully then the step gets updated to 'Done'.
But if there is an abrupt termination then it is possible that for some work
items the step still remains in 'Release'. As a result of this the step never
gets updated to 'Done' for these items and are always tried when a new MS takes
over.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)