mlsorensen opened a new issue, #6746:
URL: https://github.com/apache/cloudstack/issues/6746

   <!--
   Verify first that your issue/request is not already reported on GitHub.
   Also test if the latest release and main branch are affected too.
   Always add information AFTER of these HTML comments, but no need to delete 
the comments.
   -->
   
   ##### ISSUE TYPE
    * Enhancement Request
    
   ##### COMPONENT NAME
   ~~~
   Storage
   ~~~
   
   ##### CLOUDSTACK VERSION
   ~~~
   Any
   ~~~
   
   ##### SUMMARY
   Unmapping volumes from a hypervisor host upon VM stop or migration is done 
on a best-effort basis. The VM is already stopped, or already migrated, we try 
to unmap, but if something goes wrong there is really no recourse or retry and 
a warning is logged.  This leaves a potential of leaking maps to hosts over 
time.
   
   In code review I've also found edge cases where a VM is moved to "Stopped" 
state without necessarily cleaning up network or volume resources, these can 
also lead to leaked maps over time. Examples are force removing a hypervisor 
host with running VMs on it, and possibly any other code that just calls 
`vm.setState(State.Stopped)`.
   
   My request is that we be more thorough during VM start in ensuring that our 
target host and *only* our target host has access to the volume.  Or at least 
call the storage plugin involved to let it decide how to do this. It should be 
as simple as calling the storage service to "revoke all" just before we grant 
access, or allowing for an exclusive grant in the storage API.
   
   For example, with the PowerFlex/ScaleIO storage client there is an 
`unmapVolumeFromAllSdcs` that could be called just prior to granting access to 
volumes during VM start. 
   
   We may need to add a `revokeAllAccess()` method to the 
`PrimaryDataStoreDriver`, or add a flag to the existing `revokeAccess` to 
indicate that the storage driver should revoke all.  
   
   Or alternatively (I think I like this better), the `grantAccess()` call 
might gain a flag `boolean exclusive` so the driver can ensure that only one 
mapping exists - the one requested.  This would be cleaner.
   
   Crucially - we need to avoid exclusive access during the live migration 
workflows. It seems safe to ensure exclusive access during VM start, however.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to