[
https://issues.apache.org/jira/browse/CLOUDSTACK-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856286#comment-13856286
]
Koushik Das commented on CLOUDSTACK-5600:
-----------------------------------------
Simultaneous HA jobs for CPVM v-1-VM getting scheduled and running (see
'work-1' and 'work-8' in logs).
I also ran the following and found that the CPVM agent state is continuously
getting updated indicating that simultaneous operations are indeed happening on
the CPVM. In the start/stopVM commands getting issued I am seeing
execute.in.sequence is set to false. Earlier this used to be true.
grep "Agent status update:" management-server.log | grep v-1-VM | more
2013-12-20 16:03:18,270 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-1:ctx-e2636613) Agent status update: [id = 3; name =
v-1-VM; old status = Creating; event = AgentConnected; new status = Connecting;
old update count = 0; new update count = 1]
2013-12-20 16:03:18,927 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-1:ctx-e2636613) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 1; new update count = 2]
2013-12-20 16:03:18,270 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-1:ctx-e2636613) Agent status update: [id = 3; name =
v-1-VM; old status = Creating; event = AgentConnected; new status = Connecting;
old update count = 0; new update count = 1]
2013-12-20 16:03:18,927 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-1:ctx-e2636613) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 1; new update count = 2]
2013-12-20 17:39:22,617 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-3:ctx-af2dcfc5) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 2; new update count = 3]
2013-12-20 17:39:22,656 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-4:ctx-a295c4fa) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = AgentConnected; new status =
Connecting; old update count = 3; new update count = 4]
2013-12-20 17:39:22,953 DEBUG [c.c.h.Status] (AgentTaskPool-10:ctx-d7fea1a5)
Agent status update: [id = 3; name = v-1-VM; old status = Connecting; event =
AgentDisconnected; new status = Alert; old update count = 4; new update count =
5]
2013-12-20 17:39:23,266 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-3:ctx-af2dcfc5) Agent status update: [id = 3; name =
v-1-VM; old status = Alert; event = AgentDisconnected; new status = Alert; old
update count = 5; new update count = 6]
2013-12-20 17:39:23,708 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-4:ctx-a295c4fa) Agent status update: [id = 3; name =
v-1-VM; old status = Alert; event = AgentDisconnected; new status = Alert; old
update count = 6; new update count = 7]
2013-12-20 17:39:32,989 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-7:ctx-90a950a1) Agent status update: [id = 3; name =
v-1-VM; old status = Alert; event = AgentConnected; new status = Connecting;
old update count = 7; new update count = 8]
2013-12-20 17:39:33,468 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-7:ctx-90a950a1) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 8; new update count = 9]
2013-12-20 17:39:33,906 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-8:ctx-e1cd6751) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 9; new update count = 10]
2013-12-20 17:39:34,088 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-8:ctx-e1cd6751) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 10; new update count = 11]
2013-12-20 17:39:44,154 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-9:ctx-79eb8780) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 11; new update count = 12]
2013-12-20 17:39:44,446 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-9:ctx-79eb8780) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 12; new update count = 13]
2013-12-20 17:39:54,364 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-10:ctx-6c386eb1) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 13; new update count = 14]
2013-12-20 17:39:54,559 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-10:ctx-6c386eb1) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 14; new update count = 15]
2013-12-20 17:40:04,543 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-11:ctx-a66eac4f) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 15; new update count = 16]
2013-12-20 17:40:04,715 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-11:ctx-a66eac4f) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 16; new update count = 17]
2013-12-20 17:40:14,748 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-12:ctx-39b502b7) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 17; new update count = 18]
2013-12-20 17:40:14,914 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-12:ctx-39b502b7) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 18; new update count = 19]
2013-12-20 17:40:24,940 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-13:ctx-0b3bf833) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 19; new update count = 20]
2013-12-20 17:40:25,152 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-13:ctx-0b3bf833) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 20; new update count = 21]
2013-12-20 17:40:35,122 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-14:ctx-fc534976) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 21; new update count = 22]
2013-12-20 17:40:35,311 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-14:ctx-fc534976) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 22; new update count = 23]
2013-12-20 17:40:45,312 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-15:ctx-c0d624cd) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 23; new update count = 24]
2013-12-20 17:40:45,483 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-15:ctx-c0d624cd) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 24; new update count = 25]
2013-12-20 17:40:55,506 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-16:ctx-392fc3e1) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 25; new update count = 26]
2013-12-20 17:40:55,668 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-16:ctx-392fc3e1) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 26; new update count = 27]
2013-12-20 17:41:05,690 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-17:ctx-b1d8cedd) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 27; new update count = 28]
2013-12-20 17:41:05,859 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-17:ctx-b1d8cedd) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 28; new update count = 29]
2013-12-20 17:41:15,866 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-18:ctx-b52b1084) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 29; new update count = 30]
2013-12-20 17:41:16,028 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-18:ctx-b52b1084) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 30; new update count = 31]
2013-12-20 17:41:26,042 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-19:ctx-c142c967) Agent status update: [id = 3; name =
v-1-VM; old status = Up; event = AgentConnected; new status = Connecting; old
update count = 31; new update count = 32]
2013-12-20 17:41:26,206 DEBUG [c.c.h.Status]
(AgentConnectTaskPool-19:ctx-c142c967) Agent status update: [id = 3; name =
v-1-VM; old status = Connecting; event = Ready; new status = Up; old update
count = 32; new update count = 33]
> Xenserver - After HA , CPVM's disk is corrupted resulting in CPVM being stuck
> in "Starting" state.
> --------------------------------------------------------------------------------------------------
>
> Key: CLOUDSTACK-5600
> URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5600
> Project: CloudStack
> Issue Type: Bug
> Security Level: Public(Anyone can view this level - this is the
> default.)
> Components: Management Server
> Affects Versions: 4.3.0
> Environment: Build from 4.3
> Reporter: Sangeetha Hariharan
> Priority: Critical
> Fix For: 4.3.0
>
> Attachments: cpvmha.rar, hacpvm.png
>
>
> Xenserver - After HA , CPVM's disk is corrupted resulting in CPVM being stucK
> in "Starting" state.
> Steps to reproduce the problem:
> Set up:
> Advanced zone with 2 xenserver 6.2 hosts.
> Steps to reproduce the problem:
> Deploy few HA enabled Vms in each of the hosts .
> Disconnect network connectivity on host1 ( ifconfig eth0 down).
> Host gets marked as down and all Vms gets HA-ed to the other host in the
> cluster - host2.
> CPVM got Ha-ed to host2 and worked fine.
> host1 get rebooted and is marked as "Up" state in CP.
> Now disconnect network connectivity on host2 ( ifconfig eth0 down).
> Host gets marked as down and all Vms gets HA-ed to the other host in the
> cluster - host1.
> After this HA process , I see that the CPVM is stuck in "Starting" state in
> CP , but is in "Running" state in Xenserver.
> When I log into the console of CPVM , we see the following exception
> suggesting a disk corruption:
> Duplicate or bad block in use!
> /dev/xvda5: Multiply-claimed block(s) in inode 224: 8455 8456
> /dev/xvda5: Multiply-claimed block(s) in inode 2026: 8455 8456
> /dev/xvda5: (There are 2 inodes containing multiply-claimed blocks.)
> /dev/xvda5: File /etc/inittab (inode #224, mod time Sat Dec 21 00:14:41 2013)
> has 2 multiply-claimed block(s), shared with 1 file(s):
> /dev/xvda5: /etc/iptables/rules.v4 (inode #2026, mod time Fri Dec 20 22:39:20
> 2013)
> /dev/xvda5:
> /dev/xvda5: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> (i.e., without -a or -p options)
> fsck died with exit status 4
> failed (code 4).
> An automatic file system check (fsck) of the root filesystem failed. A manual
> fsck must be performed, then the system restarted. The fsck should be
> performed in maintenance mode with the root filesystem mounted in read-only
> mode. ... failed!
> The root filesystem is currently mounted in read-only mode. A maintenance
> shell will now be started. After performing system maintenance, press
> CONTROL-D to terminate the maintenance shell and restart the system. ...
> (warning).
> Give root password for maintenance
> (or type Control-D to continue):
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)