RE: (CLOUDSTACK-692) The CleanupSnapshotBackup process on SSVM deletes snapshots that are still in the process of being copied to secondary storage

Funs Kessen Fri, 20 Sep 2013 10:51:52 -0700

Hi Gents,

For now Daan and I put in a temporary fix for 4.1.1 (same for 4.0) which delays 
the deletion of the files for a day based on their create time. This means that 
at least snapshots that are in progress are not deleted unless it takes over a 
day to make them. The reason for putting the fix in place is that we've seen 
two production hypervisors collapse in the past two days because of the effects 
of the scavenger removing the files and tapdisk/sparse_dd going bananas and 
hitting something with NFS in the kernel that the kernel didn't like over a 
prolonged period of time. Besides this we're going to figure out what we hit in 
the kernel and hope that the next update cycle contains a patch for it, if not 
we'll have to conjure one up.

Cheers,

Funs

-----Original Message-----
From: Joris van Lieshout 
Sent: vrijdag 20 september 2013 14:51
To: '[email protected]'; '[email protected]'
Cc: 'Daan Hoogland ([email protected])'; Funs Kessen; 
'[email protected]'; 'Hugo Trippaers'
Subject: FW: (CLOUDSTACK-692) The CleanupSnapshotBackup process on SSVM deletes 
snapshots that are still in the process of being copied to secondary storage

Hi Min and Edison,

I hope you don't mind me addressing you directly. I see that you two have done 
most of the work on the Snapshot parts of CS. 
We've been having production impacting issues due to the bug I tried to 
describe below (and in ticket 692). Yes, it's my first time engaging in the 
community so I hope I've took the right approach. :) Also I've did some digging 
around in the CS 4.0, 4.1 and 4.2 code bases and see that large parts of the 
Snapshot process have been revised in 4.2. The issue that we have been having 
where using the 4.0 and 4.1 code bases and, more particularly, due to "snapshot 
... is not recorded in DB, remove it" in CleanupSnapshotBackupCommand of 
NfsSecondaryStorageResource.java.
Because CleanupSnapshotBackupCommand has been removed in commit 
https://git-wip-us.apache.org/repos/asf?p=cloudstack.git;a=commit;h=27133fba7daefcea6ddba943efb9c96f23dacef2
 I wonder if therefore this bug has also been solved?

Thanks in advanced.

Kind regards,
Joris van Lieshout

-----Original Message-----
From: Joris van Lieshout [mailto:[email protected]]
Sent: dinsdag 17 september 2013 15:56
To: '[email protected]'
Subject: (CLOUDSTACK-692) The CleanupSnapshotBackup process on SSVM deletes 
snapshots that are still in the process of being copied to secondary storage

Hi there,

I was wondering if anyone can help us with this issue? There seems to be a 
situation where the CleanupSnapshotBackup process deletes vhd files belonging 
to an active BackupSnapshot process. I've created CLOUDSTACK-692 for it and 
logged as much info as possible, including the steps I use to clean the mess up 
after we have hit this. We have seen it happen in CS 4.0 and 4.1.1, and from 
what I have seen in the code it probably also exists in 4.2.
I haven't reproduced the issue in a lab because we are hitting it quite often 
in production and uat so I have all the examples I need. :) But I guess the 
best way to reproduce it is to create a vm with quite some io activity (so 
snapshots will be big), enable hourly snapshot and shorten the 
storage.cleanup.interval global setting so the cleanup process gets trigger 
more frequently.
We are hitting this on XenServer 6.0.2 but if this snapshot cleanup and backup 
process is generally the same across other HVs type I would image this is 
relevant for that as well...

Kind regards,
Joris van Lieshout

RE: (CLOUDSTACK-692) The CleanupSnapshotBackup process on SSVM deletes snapshots that are still in the process of being copied to secondary storage

Reply via email to