[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brenn Oosterbaan updated CLOUDSTACK-7319:
-----------------------------------------

    Fix Version/s: 4.4.0

> Copy Snapshot command too heavy on XenServer Dom0 resources when using dd to 
> copy incremental snapshots
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-7319
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7319
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: Snapshot, XenServer
>    Affects Versions: 4.0.0, 4.0.1, 4.0.2, 4.1.0, 4.1.1, 4.2.0, Future, 4.2.1, 
> 4.3.0, 4.4.0, 4.5.0, 4.3.1, 4.4.1
>            Reporter: Joris van Lieshout
>            Assignee: Brenn Oosterbaan
>            Priority: Critical
>             Fix For: 4.4.0
>
>
> We noticed that the dd process was way to agressive on Dom0 causing all kinds 
> of problems on a xenserver with medium workloads. 
> ACS uses the dd command to copy incremental snapshots to secondary storage. 
> This process is to heavy on Dom0 resources and even impacts DomU performance, 
> and can even lead to domain freezes (including Dom0) of more then a minute. 
> We've found that this is because the Dom0 kernel caches the read and write 
> operations of dd.
> Some of the issues we have seen as a consequence of this are:
> - DomU performance/freezes
> - OVS freeze and not forwarding any traffic
> - Including LACPDUs resulting in the bond going down
> - keepalived heartbeat packets between RRVMs not being send/received 
> resulting in flapping RRVM master state
> - Braking snapshot copy processes
> - the xenserver heartbeat script reaching it's timeout and fencing the server
> - poolmaster connection loss
> - ACS marking the host as down and fencing the instances even though they are 
> still running on the origional host resulting in the same instance running on 
> to hosts in one cluster
> - vhd corruption are a result of some of the issues mentioned above
> We've developed a patch on the xenserver scripts 
> /etc/xapi.d/plugins/vmopsSnapshot that added the direct flag of both input 
> and output files (iflag=direct oflag=direct).
> Our test have shown that Dom0 load during snapshot copy is way lower.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to