Andy Kurth created VCL-1023:
-------------------------------
Summary: Cluster reservations may fail to copy an image if
assigned to multiple VM hosts sharing a datastore
Key: VCL-1023
URL: https://issues.apache.org/jira/browse/VCL-1023
Project: VCL
Issue Type: Bug
Components: vcld (backend)
Affects Versions: 2.4.2
Reporter: Andy Kurth
Assignee: Andy Kurth
Fix For: 2.5
Conditions:
* Cluster request
* Multiple reservations are assigned the same image revision
* Reservations are assigned to VMs on different VMware ESXi hosts
* VMware ESXi hosts share a common virtual disk image datastore
* Image does not yet exist on the datastore and needs to be copied from the
repository
Each vcld process checks if the image needs to be copied from the repository to
the datastore. Since the same image revision was assigned to multiple
reservations in the cluster request, multiple vcld processes determine the
image needs to be copied.
The code does obtain a semaphore before attempting to copy the image. However,
the semaphore name is based on both the VM host name and image name:
{noformat}
2017-03-14
00:25:46|18904|3115170|3222911|new|Module.pm:get_semaphore|1601|created
'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 557fdf0
2017-03-14
00:25:46|18908|3115170|3222912|new|Module.pm:get_semaphore|1601|created
'blade1a1-8-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 5023f10
2017-03-14
00:25:47|18913|3115170|3222914|new|Module.pm:get_semaphore|1601|created
'blade1a1-9-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 5024518
2017-03-14
00:25:47|18926|3115170|3222918|new|Module.pm:get_semaphore|1601|created
'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 50256d0
2017-03-14
00:26:12|18930|3115170|3222919|new|Module.pm:get_semaphore|1601|created
'blade1a1-11-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 5021988
2017-03-14
00:31:18|18917|3115170|3222916|new|Module.pm:get_semaphore|1601|created
'blade1a1-13-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 5578c60
2017-03-14
00:31:24|18922|3115170|3222917|new|Module.pm:get_semaphore|1601|created
'blade1a1-3-/vmfs/volumes/datastore-1a1_mcnc-fas2554/vmwarelinux-RHEL6hkim224213-v3'
Semaphore object, memory address: 4493e78
{noformat}
The first 5 processes each obtained a semaphore within 30 seconds of each
other. Afterwards, each attempted to copy the .vmdk to the same shared
directory.
The last 2 processes obeyed the semaphore and waited several minutes because
the VM host name was the same as that of another reservation. Once the process
assigned to the same VM host finished attempting to copy the .vmdk and released
the semaphore, the last 2 processes checked if the copy was still necessary.
This is how it is supposed to work for all processes copying to the same
destination.
The code should be updated to use a better name for the semaphore. The
datastore UUID should be used along with the image revision name.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)