Ok, review request in. I applied it to master, but I think there's a
whitespace issue or something that keeps it from applying to 4.1. It's
basically Edison's patch plus a logger message
in VirtualMachineManagerImpl.java that tells the admin why stopping the VM
failed (unable to connect to agent).


On Thu, Apr 25, 2013 at 10:13 PM, Marcus Sorensen <shadow...@gmail.com>wrote:

> argh, that version of stopVirtualMachine (the one returning boolean) isn't
> even used!
>
>
> On Thu, Apr 25, 2013 at 10:07 PM, Marcus Sorensen <shadow...@gmail.com>wrote:
>
>> Actually it's that your patch is for the 'forced' stop. Still doesn't
>> apply to 4.1, but the code I'm looking at for the moment is the non-forced
>> stop, which returns a boolean.
>>
>>
>> On Thu, Apr 25, 2013 at 10:05 PM, Marcus Sorensen <shadow...@gmail.com>wrote:
>>
>>> Uh, nevermind, of course I can't return null there because it requires a
>>> boolean in 4.1. I'll play with this a bit and get back with you.
>>>
>>>
>>> On Thu, Apr 25, 2013 at 10:01 PM, Marcus Sorensen 
>>> <shadow...@gmail.com>wrote:
>>>
>>>> Thanks, Edison. That doesn't apply to 4.1, but it's basically the same
>>>> as the first hunk of my patch above, just that it returns null instead of
>>>> throwing an exception.  I'm going to test that now. I didn't see why that
>>>> CloudRuntimeException in the first hunk would be squelched, but the finally
>>>> clause makes sense in the second hunk.
>>>>
>>>>
>>>> On Thu, Apr 25, 2013 at 6:08 PM, Edison Su <edison...@citrix.com>wrote:
>>>>
>>>>> This code will do the magic:
>>>>> } finally {
>>>>> if (!stopped) {
>>>>>                 if (!forced) {
>>>>>                     s_logger.warn("Unable to stop vm " + vm);
>>>>>                     try {
>>>>>                         stateTransitTo(vm, Event.OperationFailed,
>>>>> vm.getHostId());
>>>>>                     } catch (NoTransitionException e) {
>>>>>                         s_logger.warn("Unable to transition the state
>>>>> " + vm);
>>>>>                     }
>>>>>                     return false;
>>>>>                 } else {
>>>>>                     s_logger.warn("Unable to actually stop " + vm + "
>>>>> but continue with release because it's a force stop");
>>>>>                     vmGuru.finalizeStop(profile, answer);
>>>>>                 }
>>>>>             }
>>>>> }
>>>>>
>>>>>
>>>>> Basically, it means, if stop failed and it's not force stop, then mark
>>>>> the vm as running.
>>>>> I think the logic here sounds correct, as cloudstack can't delete the
>>>>> vm(due to the connection between mgt server and kvm agent is broken), then
>>>>> all the operations on the VM should fail, and the VM status shouldn't be
>>>>> changed.
>>>>>
>>>>> But the problem is, the stop api call should fail, as stop vm actually
>>>>> failed.
>>>>> Could you try the following patch:
>>>>>
>>>>> diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> index 7bf04ec..ff20c54 100755
>>>>> --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> @@ -3684,10 +3684,15 @@ public class UserVmManagerImpl extends
>>>>> ManagerBase implements UserVmManager, Use
>>>>>          }
>>>>>
>>>>>          UserVO user = _userDao.findById(userId);
>>>>> -
>>>>> +        boolean status = false;
>>>>>          try {
>>>>>              VirtualMachineEntity vmEntity =
>>>>> _orchSrvc.getVirtualMachine(vm.getUuid());
>>>>> -            vmEntity.stop(new Long(userId).toString());
>>>>> +            status = vmEntity.stop(new Long(userId).toString());
>>>>> +            if (status) {
>>>>> +               return _vmDao.findById(vmId);
>>>>> +            } else {
>>>>> +               return null;
>>>>> +            }
>>>>>          } catch (ResourceUnavailableException e) {
>>>>>              throw new CloudRuntimeException(
>>>>>                      "Unable to contact the agent to stop the virtual
>>>>> machine "
>>>>> @@ -3698,7 +3703,7 @@ public class UserVmManagerImpl extends
>>>>> ManagerBase implements UserVmManager, Use
>>>>>                              + vm, e);
>>>>>          }
>>>>>
>>>>> -        return _vmDao.findById(vmId);
>>>>> +
>>>>>      }
>>>>>
>>>>>      @Override
>>>>>
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Marcus Sorensen [mailto:shadow...@gmail.com]
>>>>> > Sent: Thursday, April 25, 2013 4:46 PM
>>>>> > To: dev@cloudstack.apache.org
>>>>> > Subject: Re: [ACS41] new critical bug
>>>>> >
>>>>> > I tried a few things, including throwing a CloudRuntimeException in
>>>>> several
>>>>> > places where I thought it made sense, such as an empty
>>>>> > AgentUnavailableException catch block, but it doesn't seem to do
>>>>> anything at
>>>>> > all, I don't see it in the logs, even though I do see the debug code
>>>>> that I
>>>>> > placed next to it.  So I give up for now :-)
>>>>> >
>>>>> > diff --git a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > index 7bf04ec..0373690 100755
>>>>> > --- a/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > +++ b/server/src/com/cloud/vm/UserVmManagerImpl.java
>>>>> > @@ -685,7 +685,8 @@ public class UserVmManagerImpl extends
>>>>> > ManagerBase implements UserVmManager, Use
>>>>> >          if (status) {
>>>>> >              return status;
>>>>> >          } else {
>>>>> > -            return status;
>>>>> > +            throw new CloudRuntimeException(
>>>>> > +                    "Unable to stop the virtual machine" + vm );
>>>>> >          }
>>>>> >      }
>>>>> >
>>>>> > diff --git a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > b/server/src/com/cloud/vm/VirtualMachineManagerImpl.
>>>>> > index 2c2986f..e320ff1 100755
>>>>> > --- a/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > +++ b/server/src/com/cloud/vm/VirtualMachineManagerImpl.java
>>>>> > @@ -1083,7 +1083,11 @@ public class VirtualMachineManagerImpl extends
>>>>> > ManagerBase implements VirtualMac
>>>>> >              vmGuru.finalizeStop(profile, answer);
>>>>> >
>>>>> >          } catch (AgentUnavailableException e) {
>>>>> > +            s_logger.error("Unable to stop vm, agent unavailable: "
>>>>> +
>>>>> > e.toString());
>>>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>>>> agent
>>>>> > unavailable");
>>>>> >          } catch (OperationTimedoutException e) {
>>>>> > +            s_logger.error("Unable to stop vm, operation timed out:
>>>>> " +
>>>>> > e.toString());
>>>>> > +            throw new CloudRuntimeException("Unable to stop vm,
>>>>> > + operation
>>>>> > timed out");
>>>>> >          } finally {
>>>>> >              if (!stopped) {
>>>>> >                  if (!forced) {
>>>>> >
>>>>> >
>>>>> > On Thu, Apr 25, 2013 at 2:06 PM, Chip Childers
>>>>> > <chip.child...@sungard.com>wrote:
>>>>> >
>>>>> > > On Thu, Apr 25, 2013 at 02:01:56PM -0600, Marcus Sorensen wrote:
>>>>> > > > I didn't mark this one as a blocker, but it's still pretty bad
>>>>> for
>>>>> > > someone
>>>>> > > > managing VMs in an automated fashion. Trying to stop a VM when
>>>>> the
>>>>> > > > KVM agent is disconnected reports success.
>>>>> > > >
>>>>> > > > https://issues.apache.org/jira/browse/CLOUDSTACK-2195
>>>>> > >
>>>>> > > I'll pull a fix in, if we have one ready before the final blockers.
>>>>> > > Otherwise I'd pull it into a 4.1.1 release.
>>>>> > >
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to