Sangeetha Hariharan created CLOUDSTACK-5499:
-----------------------------------------------

             Summary: Vmware -When nfs was down for about 12 hours  and then 
brought back up again , snasphots are not being attempted for some of the 
volumes which have snaphots that are in "CreatedOnPrimary" state.
                 Key: CLOUDSTACK-5499
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5499
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.3.0
         Environment: Build from 4.3
            Reporter: Sangeetha Hariharan
            Priority: Critical
             Fix For: 4.3.0


Vmware -When nfs was down for about 12 hours  and then brought back up again , 
snasphots are not being attempted for some of the volumes which have snaphots 
that are in "CreatedOnPrimary" state.

Set up :
Advanced Zone with 2 5.1 ESXI hosts.

Steps to reproduce the problem:

1. Deploy 5 Vms in each of the hosts , so we start with 11 Vms.
2. Start concurrent snapshots for ROOT volumes of all the Vms.
3. Shutdown the Secondary storage server when the snapshots are in the progress.
4. Bring the Secondary storage server up after 12 hours.

Follwoing are the issues that are seen in this run:

1. I see that the snapshots that are in Progress , report failures only after 
12 hours even though the backup.snapshot.wait is set to 12 hours.

2. New snapshot request that were executed when the NFS server was down , do  
not report failure immediately. In my case , i see that such  request 
eventually succeeded when the NFS server was brought up. Is this the expected 
behavior ? Should we not expect to fail right away , instead of holding on to 
such active  sessions ?

3. Some of the snapshot failures resulted in snaphots that are in 
"CreatedOnPrimary" state. For such volumes , snapshots are not being attempted 
at all , even though  the NFS server was brought up.

Volumes in this state are - 16,18,17,22.

There are instances where  I have seen the snapshots being scheduled and 
succeeding even when the previous state was "CreatedOnPrimary". Why are were 
able to schedule snapshots in such cases ? And sometimes not in other cases?

mysql> select volume_id,status,created from snapshots where volume_id=18;
+-----------+------------------+---------------------+
| volume_id | status           | created             |
+-----------+------------------+---------------------+
|        18 | Destroyed        | 2013-12-12 23:24:14 |
|        18 | CreatedOnPrimary | 2013-12-12 23:53:39 |
|        18 | BackedUp         | 2013-12-13 01:53:38 |
|        18 | CreatedOnPrimary | 2013-12-13 03:53:38 |
+-----------+------------------+---------------------+



mysql> select volume_id,status,created from snapshots;
+-----------+------------------+---------------------+
| volume_id | status           | created             |
+-----------+------------------+---------------------+
|        22 | Destroyed        | 2013-12-12 23:24:13 |
|        21 | Destroyed        | 2013-12-12 23:24:13 |
|        20 | Destroyed        | 2013-12-12 23:24:14 |
|        19 | Destroyed        | 2013-12-12 23:24:14 |
|        18 | Destroyed        | 2013-12-12 23:24:14 |
|        17 | Destroyed        | 2013-12-12 23:24:14 |
|        16 | Destroyed        | 2013-12-12 23:24:14 |
|        14 | Destroyed        | 2013-12-12 23:24:15 |
|        25 | Destroyed        | 2013-12-12 23:24:15 |
|        24 | Destroyed        | 2013-12-12 23:24:15 |
|        23 | Destroyed        | 2013-12-12 23:24:15 |
|        22 | CreatedOnPrimary | 2013-12-12 23:53:38 |
|        21 | Destroyed        | 2013-12-12 23:53:38 |
|        20 | Destroyed        | 2013-12-12 23:53:38 |
|        19 | Destroyed        | 2013-12-12 23:53:39 |
|        18 | CreatedOnPrimary | 2013-12-12 23:53:39 |
|        17 | CreatedOnPrimary | 2013-12-12 23:53:40 |
|        16 | CreatedOnPrimary | 2013-12-12 23:53:40 |
|        14 | Destroyed        | 2013-12-12 23:53:40 |
|        25 | Destroyed        | 2013-12-12 23:53:41 |
|        24 | Destroyed        | 2013-12-12 23:53:41 |
|        23 | Destroyed        | 2013-12-12 23:53:42 |
|        21 | Destroyed        | 2013-12-13 00:53:37 |
|        19 | Destroyed        | 2013-12-13 00:53:38 |
|        22 | BackedUp         | 2013-12-13 01:53:37 |
|        21 | Destroyed        | 2013-12-13 01:53:38 |
|        20 | Destroyed        | 2013-12-13 01:53:38 |
|        19 | Destroyed        | 2013-12-13 01:53:38 |
|        18 | BackedUp         | 2013-12-13 01:53:38 |
|        17 | BackedUp         | 2013-12-13 01:53:38 |
|        16 | BackedUp         | 2013-12-13 01:53:39 |
|        14 | Destroyed        | 2013-12-13 01:53:39 |
|        25 | Destroyed        | 2013-12-13 01:53:39 |
|        24 | Destroyed        | 2013-12-13 01:53:39 |
|        23 | Destroyed        | 2013-12-13 01:53:40 |
|        22 | CreatedOnPrimary | 2013-12-13 03:53:37 |
|        21 | Destroyed        | 2013-12-13 03:53:38 |
|        20 | Destroyed        | 2013-12-13 03:53:38 |
|        19 | Destroyed        | 2013-12-13 03:53:38 |
|        18 | CreatedOnPrimary | 2013-12-13 03:53:38 |
|        17 | CreatedOnPrimary | 2013-12-13 03:53:38 |
|        16 | CreatedOnPrimary | 2013-12-13 03:53:39 |
|        14 | Destroyed        | 2013-12-13 03:53:39 |
|        24 | Destroyed        | 2013-12-13 08:53:37 |
|        25 | Destroyed        | 2013-12-13 09:53:37 |
|        23 | Destroyed        | 2013-12-13 10:53:37 |
|        21 | Destroyed        | 2013-12-13 16:53:37 |
|        20 | Destroyed        | 2013-12-13 16:53:38 |
|        19 | Destroyed        | 2013-12-13 16:53:38 |
|        14 | Destroyed        | 2013-12-13 16:53:38 |
|        21 | BackedUp         | 2013-12-13 18:53:37 |
|        20 | BackedUp         | 2013-12-13 18:53:38 |
|        19 | BackedUp         | 2013-12-13 18:53:38 |
|        14 | BackedUp         | 2013-12-13 18:53:38 |
|        25 | BackedUp         | 2013-12-13 18:53:38 |
|        24 | BackedUp         | 2013-12-13 18:53:38 |
|        23 | BackedUp         | 2013-12-13 18:53:39 |
|        21 | BackedUp         | 2013-12-13 19:53:37 |
|        20 | BackedUp         | 2013-12-13 19:53:38 |
|        19 | BackedUp         | 2013-12-13 19:53:38 |
|        14 | BackedUp         | 2013-12-13 19:53:38 |
|        25 | BackedUp         | 2013-12-13 19:53:38 |
|        24 | BackedUp         | 2013-12-13 19:53:39 |
|        23 | BackedUp         | 2013-12-13 19:53:39 |
+-----------+------------------+---------------------+




--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to