CS User created CLOUDSTACK-7775:
-----------------------------------
Summary: Xen S3 backed secondary storage - local volume snapshots
fail
Key: CLOUDSTACK-7775
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7775
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Affects Versions: 4.4.0, 4.3.1
Reporter: CS User
Xenserver, 6.01 latest hotfixes, cloudstack 4.3 and 4.4. Snapshot of volume on
local disk, to be transferred to S3. When the xenhost attemps the PUT request,
the snapshot is no longer present and so the request fails. Snapshots of
volumes stored on Primary storage work fine and are uploaded to the S3 backed
secondary storage as expected.
Also tried upgrading the hosts to Xenserver 6.2, however it still has the same
issue. On another environment with cloudstack 4.3, with Xenserver 6.01 (no
recent xen hotfixes and no S3 backed secondary storage), local snapshots work
fine.
We see this in the management server logs:
(I have removed references to any hosts from the logs and amended IP's)
{noformat}
2014-10-21 12:36:55,332 WARN [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) Task failed! Task record: uuid:
c520bf5a-1bc1-57ca-f9d0-960179118118
nameLabel: Async.host.call_plugin
nameDescription:
allowedOperations: []
currentOperations: {}
created: Tue Oct 21 12:36:56 BST 2014
finished: Tue Oct 21 12:36:56 BST 2014
status: failure
residentOn: com.xensource.xenapi.Host@1dd85be5
progress: 1.0
type: <none/>
result:
errorInfo: [XENAPI_PLUGIN_FAILURE, getSnapshotSize,
CommandException, 2]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
2014-10-21 12:36:55,386 WARN [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) callHostPlugin failed for cmd: getSnapshotSize
with args snapshotUuid: 0cef2f39-03e8-458d-bf19-7a2294c40ac7, isISCSI: true,
primaryStorageSRUuid: b1802ff2-5a63-d1e7-04de-dea7ba7eab27, due to Task
failed! Task record: uuid: c520bf5a-1bc1-57ca-f9d0-960179118118
nameLabel: Async.host.call_plugin
nameDescription:
allowedOperations: []
currentOperations: {}
created: Tue Oct 21 12:36:56 BST 2014
finished: Tue Oct 21 12:36:56 BST 2014
status: failure
residentOn: com.xensource.xenapi.Host@1dd85be5
progress: 1.0
type: <none/>
result:
errorInfo: [XENAPI_PLUGIN_FAILURE, getSnapshotSize,
CommandException, 2]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
Task failed! Task record: uuid:
c520bf5a-1bc1-57ca-f9d0-960179118118
nameLabel: Async.host.call_plugin
nameDescription:
allowedOperations: []
currentOperations: {}
created: Tue Oct 21 12:36:56 BST 2014
finished: Tue Oct 21 12:36:56 BST 2014
status: failure
residentOn: com.xensource.xenapi.Host@1dd85be5
progress: 1.0
type: <none/>
result:
errorInfo: [XENAPI_PLUGIN_FAILURE, getSnapshotSize,
CommandException, 2]
otherConfig: {}
subtaskOf: com.xensource.xenapi.Task@aaf13f6f
subtasks: []
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.checkForSuccess(CitrixResourceBase.java:3293)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.callHostPluginAsync(CitrixResourceBase.java:3507)
at
com.cloud.hypervisor.xen.resource.XenServerStorageProcessor.getSnapshotSize(XenServerStorageProcessor.java:1211)
at
com.cloud.hypervisor.xen.resource.XenServerStorageProcessor.backupSnapshot(XenServerStorageProcessor.java:1401)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:94)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:52)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:546)
at
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:61)
at
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
at
com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:216)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
2014-10-21 12:36:55,391 WARN [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) unable to destroy
task(com.xensource.xenapi.Task@84832980) on
host(45026915-2bf9-4ec1-8b57-f211194ff29b) due to
You gave an invalid object reference. The object may have recently been
deleted. The class parameter gives the type of reference given, and the handle
parameter echoes the bad value given.
at com.xensource.xenapi.Types.checkResponse(Types.java:693)
at com.xensource.xenapi.Connection.dispatch(Connection.java:350)
at
com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:456)
at com.xensource.xenapi.Task.destroy(Task.java:616)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.callHostPluginAsync(CitrixResourceBase.java:3523)
at
com.cloud.hypervisor.xen.resource.XenServerStorageProcessor.getSnapshotSize(XenServerStorageProcessor.java:1211)
at
com.cloud.hypervisor.xen.resource.XenServerStorageProcessor.backupSnapshot(XenServerStorageProcessor.java:1401)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:94)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:52)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:546)
at
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:61)
at
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
at
com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:216)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
2014-10-21 12:36:56,777 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-12:null) SeqA 132-143: Processing Seq 132-143: { Cmd ,
MgmtId: -1, via: 132, Ver: v1, Flags: 11,
[{"com.cloud.agent.api.ConsoleProxyLoadReportCommand":{"_proxyVmId":64,"_loadInfo":"{\n
\"connections\": []\n}","wait":0}}] }
2014-10-21 12:36:56,804 DEBUG [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) Host 192.14.225.18
OpaqueRef:a0fd0b23-f978-5f67-b626-1eadc343a141: Removing SR
2014-10-21 12:36:56,808 DEBUG [c.c.a.m.AgentManagerImpl]
(AgentManager-Handler-12:null) SeqA 132-143: Sending Seq 132-143: { Ans: ,
MgmtId: 345052759949, via: 132, Ver: v1, Flags: 100010,
[{"com.cloud.agent.api.AgentControlAnswer":{"result":true,"wait":0}}] }
2014-10-21 12:36:56,818 DEBUG [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) Host 192.14.225.18
OpaqueRef:b02a715a-229e-3758-f4e4-88a7709810b5: Unplugging pbd
2014-10-21 12:36:57,488 DEBUG [c.c.h.x.r.CitrixResourceBase]
(DirectAgent-235:ctx-55338794) Host 192.14.225.18
OpaqueRef:a0fd0b23-f978-5f67-b626-1eadc343a141: Forgetting
2014-10-21 12:36:57,504 WARN [c.c.h.x.r.XenServerStorageProcessor]
(DirectAgent-235:ctx-55338794) BackupSnapshot Failed due to S3 upload of
snapshots 0cef2f39-03e8-458d-bf19-7a2294c40ac7 failed
com.cloud.utils.exception.CloudRuntimeException: S3 upload of snapshots
0cef2f39-03e8-458d-bf19-7a2294c40ac7 failed
at
com.cloud.hypervisor.xen.resource.XenServerStorageProcessor.backupSnapshot(XenServerStorageProcessor.java:1421)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.execute(StorageSubsystemCommandHandlerBase.java:94)
at
com.cloud.storage.resource.StorageSubsystemCommandHandlerBase.handleStorageCommands(StorageSubsystemCommandHandlerBase.java:52)
at
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:546)
at
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:61)
at
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
at
com.cloud.agent.manager.DirectAgentAttache$Task.runInContext(DirectAgentAttache.java:216)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:49)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:56)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:103)
at
org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:53)
at
org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:46)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
2014-10-21 12:36:57,885 DEBUG [c.c.a.ApiServlet]
(catalina-exec-11:ctx-3e69dc37) ===START=== 192.168.20.10 -- GET
command=queryAsyncJobResult&jobId=436e0dd3-283a-471c-aba1-5ecbdd19e0e5&response=json&sessionkey=bC690U9amEIq2aZn5caH5oi0Tog%3D&_=1413891416702
{noformat}
On the xenserver we see the following in the /var/log/cloud/vmops.log file:
{noformat}
2014-10-21 12:36:11 DEBUG [root] #### VMOPS enter
create_secondary_storage_folder ####
2014-10-21 12:36:11 DEBUG [root] create_secondary_storage_folder, args:
{'newFolder': 'snapshots/2/257', 'remoteMountPath':
'hostname.fake.net:/CS_TEST_SEC_01'}
2014-10-21 12:36:11 DEBUG [root] Successfully mounted
hostname.fake.net:/CS_TEST_SEC_01 to
/var/run/cloud_mount/17652326-6d41-44fc-b7e5-7ea0ab7e2d4d
2014-10-21 12:36:11 DEBUG [root] Successfully unmounted
/var/run/cloud_mount/17652326-6d41-44fc-b7e5-7ea0ab7e2d4d
2014-10-21 12:36:11 DEBUG [root] #### VMOPS exit
create_secondary_storage_folder ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS enter get_rule_logs_for_vms ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS enter
network_rules_for_rebooted_vm ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS enter check_domid_changed ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS exit check_domid_changed ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS exit
network_rules_for_rebooted_vm ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS enter
network_rules_for_rebooted_vm ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS enter check_domid_changed ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS exit check_domid_changed ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS exit
network_rules_for_rebooted_vm ####
2014-10-21 12:36:31 DEBUG [root] #### VMOPS exit get_rule_logs_for_vms ####
2014-10-21 12:36:56 DEBUG [root] primarySRPath:
/dev/VG_XenStorage-b1802ff2-5a63-d1e7-04de-dea7ba7eab27
2014-10-21 12:36:58 DEBUG [root] #### VMOPS enter deleteSnapshotBackup ####
2014-10-21 12:36:58 DEBUG [root] Calling deleteSnapshotBackup with
{'localMountPoint':
'/var/run/cloud_mount/ccd200be-0636-32f6-ad98-fb8e2fa8c70c', 'backupUUID':
'0cef2f39-03e8-458d-bf19-7a2294c40ac7', 'secondaryStora
geMountPath': 'hostname.fake.net:/CS_TEST_SEC_01', 'path': 'snapshots/2/257'}
2014-10-21 12:36:58 DEBUG [root] Successfully mounted
hostname.fake.net:/CS_TEST_SEC_01/snapshots to
/var/run/cloud_mount/ccd200be-0636-32f6-ad98-fb8e2fa8c70c
2014-10-21 12:36:58 DEBUG [root] Chdired to
/var/run/cloud_mount/ccd200be-0636-32f6-ad98-fb8e2fa8c70c/2/257
2014-10-21 12:36:58 DEBUG [root] checking existence of
0cef2f39-03e8-458d-bf19-7a2294c40ac7.vhd
2014-10-21 12:36:58 DEBUG [root] backupVHD
0cef2f39-03e8-458d-bf19-7a2294c40ac7.vhd exists.
2014-10-21 12:36:58 DEBUG [root] #### VMOPS exit deleteSnapshotBackup ####
{noformat}
On the xenserver we see the following in the /var/log/cloud/s3xen.log file:
{noformat}
2014-10-21 12:36:57 DEBUG [root] #### VMOPS enter s3 #### ####
2014-10-21 12:36:57 DEBUG [root] #### VMOPS Operation put on file
/dev/VG_XenStorage-b1802ff2-5a63-d1e7-04de-dea7ba7eab27/VHD-0cef2f39-03e8-458d-bf19-7a2294c40ac7
from/in bucket CS_TEST_SEC_01 key
snapshots/2/257/VHD-0cef2f39-03e8-458d-bf19-7a2294c40ac7 ####
2014-10-21 12:36:57 DEBUG [root] #### VMOPS Traceback (most recent call
last):
File "/etc/xapi.d/plugins/s3xen", line 414, in s3
client.put(bucket, key, filename, maxSingleUploadBytes)
File "/etc/xapi.d/plugins/s3xen", line 325, in put
raise Exception(
Exception: Attempt to put
/dev/VG_XenStorage-b1802ff2-5a63-d1e7-04de-dea7ba7eab27/VHD-0cef2f39-03e8-458d-bf19-7a2294c40ac7
that does not exist.
####
2014-10-21 12:36:57 DEBUG [root] #### VMOPS exit s3 with result false ####
{noformat}
It appears that Xen is deleting the file on the local filesystem before the S3
script has a chance to upload it, however the snapshot does appear to have been
successfully transferred to the secondary storage staging area, after which the
S3 transfer is attempted from the local lvm volume. When this fails the
snapshot is deleted, as you can see from the above logs.
It appears that either the issue with Xen deleting the snapshot needs to be
resolved (has this behavior changed recently?) or the PUT request to S3 needs
to send the file which is located in the staging area, rather than the file on
local disk, as happens with the primary storage volume snapshots.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)