[jira] [Commented] (CLOUDSTACK-5432) [Automation] Libvtd getting crashed and agent going to alert start

Marcus Sorensen (JIRA) Thu, 26 Dec 2013 22:12:26 -0800

    [ 
https://issues.apache.org/jira/browse/CLOUDSTACK-5432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13857335#comment-13857335
 ]


Marcus Sorensen commented on CLOUDSTACK-5432:
---------------------------------------------

yeah, sorry. This is due to LibvirtStorageAdaptor's 
disconnectPhysicalDiskByPath. It has this old code copied from 
LibvirtComputingResource to LibvirtStorageadaptor:

    public boolean cleanupDisk(Connect conn, DiskDef disk) {
        // need to umount secondary storage
        String path = disk.getDiskPath();
        String poolUuid = null;
        if (path.endsWith("systemvm.iso")) {
            //Don't need to clean up system vm iso, as it's stored in local
            return true;
        }
        if (path != null) {
            String[] token = path.split("/");
            if (token.length > 3) {
                poolUuid = token[2];
            }
        }

        if (poolUuid == null) {
            return true;
        }

        try {
            // we use libvirt as storage adaptor since we passed a libvirt
            // connection to cleanupDisk. We pass a storage type that maps
            // to libvirt adaptor.
            KVMStoragePool pool = _storagePoolMgr.getStoragePool(
                                      StoragePoolType.Filesystem, poolUuid);
            if (pool != null) {
                
_storagePoolMgr.deleteStoragePool(pool.getType(),pool.getUuid());
            }
            return true;
        } catch (CloudRuntimeException e) {
            return false;
        }
    }

The issue is that this code used to only ever be called for isos. Now it is 
called for all disks, since we need to check with all storage adaptors if they 
have anything to clean up. We care about more than just Libvirt's storage now. 
So LibvirtStorageAdaptor just needs to be adjusted to only care/execute this 
code for isos, and things will be back as they were. Testing the fix now.

> [Automation] Libvtd getting crashed and agent going to alert start 
> -------------------------------------------------------------------
>
>                 Key: CLOUDSTACK-5432
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-5432
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: KVM
>    Affects Versions: 4.3.0
>         Environment: KVM (RHEL 6.3)
> Branch : 4.3
>            Reporter: Rayees Namathponnan
>            Assignee: Marcus Sorensen
>            Priority: Blocker
>             Fix For: 4.3.0
>
>         Attachments: KVM_Automation_Dec_11.rar, agent1.rar, agent2.rar, 
> management-server.rar
>
>
> This issue is observed in  4.3 automation environment;  libvirt crashed and 
> cloudstack agent went to alert start;
> Please see the agent log; connection between agent and MS lost with error 
> "Connection closed with -1 on reading size."  @ 2013-12-09 19:47:06,969
> 2013-12-09 19:43:41,495 DEBUG [cloud.agent.Agent] 
> (agentRequest-Handler-2:null) Processing command: 
> com.cloud.agent.api.GetStorageStatsCommand
> 2013-12-09 19:47:06,969 DEBUG [utils.nio.NioConnection] (Agent-Selector:null) 
> Location 1: Socket Socket[addr=/10.223.49.195,port=8250,localport=40801] 
> closed on read.  Probably -1 returned: Connection closed with -1 on reading 
> size.
> 2013-12-09 19:47:06,969 DEBUG [utils.nio.NioConnection] (Agent-Selector:null) 
> Closing socket Socket[addr=/10.223.49.195,port=8250,localport=40801]
> 2013-12-09 19:47:06,969 DEBUG [cloud.agent.Agent] (Agent-Handler-3:null) 
> Clearing watch list: 2
> 2013-12-09 19:47:11,969 INFO  [cloud.agent.Agent] (Agent-Handler-3:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-09 19:47:11,970 INFO  [cloud.agent.Agent] (Agent-Handler-3:null) 
> Cannot connect because we still have 5 commands in progress.
> 2013-12-09 19:47:16,970 INFO  [cloud.agent.Agent] (Agent-Handler-3:null) Lost 
> connection to the server. Dealing with the remaining commands...
> 2013-12-09 19:47:16,990 INFO  [cloud.agent.Agent] (Agent-Handler-3:null) 
> Cannot connect because we still have 5 commands in progress.
> 2013-12-09 19:47:21,990 INFO  [cloud.agent.Agent] (Agent-Handler-3:null) Lost 
> connection to the server. Dealing with the remaining commands.. 
> Please see the lib virtd log at same time (please see the attached complete 
> log, there is a 5 hour  difference in agent log and libvirt log ) 
> 2013-12-10 02:45:45.563+0000: 5938: error : qemuMonitorIO:574 : internal 
> error End of file from monitor
> 2013-12-10 02:45:47.663+0000: 5942: error : virCommandWait:2308 : internal 
> error Child process (/bin/umount /mnt/41b632b5-40b3-3024-a38b-ea259c72579f) 
> status unexpected: exit status 16
> 2013-12-10 02:45:53.925+0000: 5943: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet14 root) status unexpected: 
> exit status 2
> 2013-12-10 02:45:53.929+0000: 5943: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet14 ingress) status 
> unexpected: exit status 2
> 2013-12-10 02:45:54.011+0000: 5943: warning : qemuDomainObjTaint:1297 : 
> Domain id=71 name='i-45-97-QA' uuid=7717ba08-be84-4b63-a674-1534f9dc7bef is 
> tainted: high-privileges
> 2013-12-10 02:46:33.070+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet12 root) status unexpected: 
> exit status 2
> 2013-12-10 02:46:33.081+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet12 ingress) status 
> unexpected: exit status 2
> 2013-12-10 02:46:33.197+0000: 5940: warning : qemuDomainObjTaint:1297 : 
> Domain id=72 name='i-47-111-QA' uuid=7fcce58a-96dc-4207-9998-b8fb72b446ac is 
> tainted: high-privileges
> 2013-12-10 02:46:36.394+0000: 5938: error : qemuMonitorIO:574 : internal 
> error End of file from monitor
> 2013-12-10 02:46:37.685+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/bin/umount /mnt/41b632b5-40b3-3024-a38b-ea259c72579f) 
> status unexpected: exit status 16
> 2013-12-10 02:46:57.869+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet15 root) status unexpected: 
> exit status 2
> 2013-12-10 02:46:57.873+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet15 ingress) status 
> unexpected: exit status 2
> 2013-12-10 02:46:57.925+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet17 root) status unexpected: 
> exit status 2
> 2013-12-10 02:46:57.933+0000: 5940: error : virCommandWait:2308 : internal 
> error Child process (/sbin/tc qdisc del dev vnet17 ingress) status 
> unexpected: exit status 2
> 2013-12-10 02:46:58.034+0000: 5940: warning : qemuDomainObjTaint:1297 : 
> Domain id=73 name='r-114-QA' uuid=8ded6f1b-69e7-419d-8396-5795372d0ae2 is 
> tainted: high-privileges
> 2013-12-10 02:47:22.762+0000: 5938: error : qemuMonitorIO:574 : internal 
> error End of file from monitor
> 2013-12-10 02:47:23.273+0000: 5939: error : virCommandWait:2308 : internal 
> error Child process (/bin/umount /mnt/41b632b5-40b3-3024-a38b-ea259c72579f) 
> status unexpected: exit status 16
> virsh command doest not return anything and hung;
> [root@Rack2Host11 libvirt]# virsh list
> Work around
> If i restart libvirtd,  agent can connect MS 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CLOUDSTACK-5432) [Automation] Libvtd getting crashed and agent going to alert start

Reply via email to