Repository: cloudstack-docs-admin Updated Branches: refs/heads/master 095494870 -> 5744c87eb
add internet troubleshooting section Project: http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/repo Commit: http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/commit/5744c87e Tree: http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/tree/5744c87e Diff: http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/diff/5744c87e Branch: refs/heads/master Commit: 5744c87ebfc26154a5bc3af5f0481fec87e0220f Parents: 0954948 Author: Sebastien Goasguen <run...@gmail.com> Authored: Thu Mar 20 05:37:08 2014 -0400 Committer: Sebastien Goasguen <run...@gmail.com> Committed: Thu Mar 20 05:37:08 2014 -0400 ---------------------------------------------------------------------- source/events.rst | 222 +------------------- source/index.rst | 4 +- source/troubleshooting.rst | 446 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 449 insertions(+), 223 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/blob/5744c87e/source/events.rst ---------------------------------------------------------------------- diff --git a/source/events.rst b/source/events.rst index 29d927d..d2de9db 100644 --- a/source/events.rst +++ b/source/events.rst @@ -326,224 +326,4 @@ Procedure #. - Click OK. - - -TroubleShooting -=============== - -Working with Server Logs ------------------------- - -The CloudStack Management Server logs all web site, middle tier, and -database activities for diagnostics purposes in -`/var/log/cloudstack/management/`. The CloudStack logs a variety of error -messages. We recommend this command to find the problematic output in -the Management Server log:. - -.. note:: When copying and pasting a command, be sure the command has pasted as a -single line before executing. Some document viewers may introduce -unwanted line breaks in copied text. - -.. code:: bash - - grep -i -E 'exception|unable|fail|invalid|leak|warn|error' /var/log/cloudstack/management/management-server.log - -The CloudStack processes requests with a Job ID. If you find an error in -the logs and you are interested in debugging the issue you can grep for -this job ID in the management server log. For example, suppose that you -find the following ERROR message: - -.. code:: bash - - 2010-10-04 13:49:32,595 ERROR [cloud.vm.UserVmManagerImpl] (Job-Executor-11:job-1076) Unable to find any host for [User|i-8-42-VM-untagged] - -Note that the job ID is 1076. You can track back the events relating to -job 1076 with the following grep: - -.. code:: bash - - grep "job-1076)" management-server.log - -The CloudStack Agent Server logs its activities in `/var/log/cloudstack/agent/`. - - -Data Loss on Exported Primary Storage -------------------------------------- - -Symptom -~~~~~~~ - -Loss of existing data on primary storage which has been exposed as a -Linux NFS server export on an iSCSI volume. - -Cause -~~~~~ - -It is possible that a client from outside the intended pool has mounted -the storage. When this occurs, the LVM is wiped and all data in the -volume is lost - -Solution -~~~~~~~~ - -When setting up LUN exports, restrict the range of IP addresses that are -allowed access by specifying a subnet mask. For example: - -.. code:: bash - - echo â/export 192.168.1.0/24(rw,async,no_root_squash,no_subtree_check)â > /etc/exports - -Adjust the above command to suit your deployment needs. - -More Information -~~~~~~~~~~~~~~~~ - -See the export procedure in the "Secondary Storage" section of the -CloudStack Installation Guide - -Recovering a Lost Virtual Router --------------------------------- - -Symptom -~~~~~~~ - -A virtual router is running, but the host is disconnected. A virtual -router no longer functions as expected. - -Cause -~~~~~ - -The Virtual router is lost or down. - -Solution -~~~~~~~~ - -If you are sure that a virtual router is down forever, or no longer -functions as expected, destroy it. You must create one afresh while -keeping the backup router up and running (it is assumed this is in a -redundant router setup): - -- - - Force stop the router. Use the stopRouter API with forced=true - parameter to do so. - -- - - Before you continue with destroying this router, ensure that the - backup router is running. Otherwise the network connection will be - lost. - -- - - Destroy the router by using the destroyRouter API. - -Recreate the missing router by using the restartNetwork API with -cleanup=false parameter. For more information about redundant router -setup, see Creating a New Network Offering. - -For more information about the API syntax, see the API Reference at -`http://cloudstack.apache.org/docs/api/ <http://cloudstack.apache.org/docs/api/>`_. - -Maintenance mode not working on vCenter ---------------------------------------- - -Symptom -~~~~~~~ - -Host was placed in maintenance mode, but still appears live in vCenter. - -Cause -~~~~~~ - -The CloudStack administrator UI was used to place the host in scheduled -maintenance mode. This mode is separate from vCenter's maintenance mode. - -Solution -~~~~~~~~ - -Use vCenter to place the host in maintenance mode. - - -Unable to deploy VMs from uploaded vSphere template ---------------------------------------------------- - -Symptom -~~~~~~~~ - -When attempting to create a VM, the VM will not deploy. - -Cause -~~~~~ - -If the template was created by uploading an OVA file that was created -using vSphere Client, it is possible the OVA contained an ISO image. If -it does, the deployment of VMs from the template will fail. - -Solution -~~~~~~~~ - -Remove the ISO and re-upload the template. - -Unable to power on virtual machine on VMware --------------------------------------------- - -Symptom -~~~~~~~ - -Virtual machine does not power on. You might see errors like: - -- - - Unable to open Swap File - -- - - Unable to access a file since it is locked - -- - - Unable to access Virtual machine configuration - -Cause -~~~~~ - -A known issue on VMware machines. ESX hosts lock certain critical -virtual machine files and file systems to prevent concurrent changes. -Sometimes the files are not unlocked when the virtual machine is powered -off. When a virtual machine attempts to power on, it can not access -these critical files, and the virtual machine is unable to power on. - -Solution -~~~~~~~~ - -See the following: - -`VMware Knowledge Base -Article <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10051/>`__ - -Load balancer rules fail after changing network offering --------------------------------------------------------- - -Symptom -~~~~~~~ - -After changing the network offering on a network, load balancer rules -stop working. - -Cause -~~~~~ - -Load balancing rules were created while using a network service offering -that includes an external load balancer device such as NetScaler, and -later the network service offering changed to one that uses the -CloudStack virtual router. - -Solution -~~~~~~~~ - -Create a firewall rule on the virtual router for each of your existing -load balancing rules so that they continue to function. - - + Click OK. \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/blob/5744c87e/source/index.rst ---------------------------------------------------------------------- diff --git a/source/index.rst b/source/index.rst index 5210991..3c89fb6 100644 --- a/source/index.rst +++ b/source/index.rst @@ -13,7 +13,7 @@ specific language governing permissions and limitations under the License. -.. CloudStack Administration Documentation documentation master file, created by +.. CloudStack Administration Documentation master file, created by sphinx-quickstart on Sat Jan 25 15:55:12 2014. You can adapt this file completely to your liking, but it should at least contain the root `toctree` directive. @@ -140,4 +140,4 @@ Events and Troubleshooting :maxdepth: 2 events - + troubleshooting http://git-wip-us.apache.org/repos/asf/cloudstack-docs-admin/blob/5744c87e/source/troubleshooting.rst ---------------------------------------------------------------------- diff --git a/source/troubleshooting.rst b/source/troubleshooting.rst new file mode 100644 index 0000000..37a733d --- /dev/null +++ b/source/troubleshooting.rst @@ -0,0 +1,446 @@ +.. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information# + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +TroubleShooting +=============== + +Working with Server Logs +------------------------ + +The CloudStack Management Server logs all web site, middle tier, and +database activities for diagnostics purposes in +`/var/log/cloudstack/management/`. The CloudStack logs a variety of error +messages. We recommend this command to find the problematic output in +the Management Server log:. + +.. note:: When copying and pasting a command, be sure the command has pasted as a +single line before executing. Some document viewers may introduce +unwanted line breaks in copied text. + +.. code:: bash + + grep -i -E 'exception|unable|fail|invalid|leak|warn|error' /var/log/cloudstack/management/management-server.log + +The CloudStack processes requests with a Job ID. If you find an error in +the logs and you are interested in debugging the issue you can grep for +this job ID in the management server log. For example, suppose that you +find the following ERROR message: + +.. code:: bash + + 2010-10-04 13:49:32,595 ERROR [cloud.vm.UserVmManagerImpl] (Job-Executor-11:job-1076) Unable to find any host for [User|i-8-42-VM-untagged] + +Note that the job ID is 1076. You can track back the events relating to +job 1076 with the following grep: + +.. code:: bash + + grep "job-1076)" management-server.log + +The CloudStack Agent Server logs its activities in `/var/log/cloudstack/agent/`. + + +Data Loss on Exported Primary Storage +------------------------------------- + +Symptom +~~~~~~~ + +Loss of existing data on primary storage which has been exposed as a +Linux NFS server export on an iSCSI volume. + +Cause +~~~~~ + +It is possible that a client from outside the intended pool has mounted +the storage. When this occurs, the LVM is wiped and all data in the +volume is lost + +Solution +~~~~~~~~ + +When setting up LUN exports, restrict the range of IP addresses that are +allowed access by specifying a subnet mask. For example: + +.. code:: bash + + echo â/export 192.168.1.0/24(rw,async,no_root_squash,no_subtree_check)â > /etc/exports + +Adjust the above command to suit your deployment needs. + +More Information +~~~~~~~~~~~~~~~~ + +See the export procedure in the "Secondary Storage" section of the +CloudStack Installation Guide + +Recovering a Lost Virtual Router +-------------------------------- + +Symptom +~~~~~~~ + +A virtual router is running, but the host is disconnected. A virtual +router no longer functions as expected. + +Cause +~~~~~ + +The Virtual router is lost or down. + +Solution +~~~~~~~~ + +If you are sure that a virtual router is down forever, or no longer +functions as expected, destroy it. You must create one afresh while +keeping the backup router up and running (it is assumed this is in a +redundant router setup): + +- + + Force stop the router. Use the stopRouter API with forced=true + parameter to do so. + +- + + Before you continue with destroying this router, ensure that the + backup router is running. Otherwise the network connection will be + lost. + +- + + Destroy the router by using the destroyRouter API. + +Recreate the missing router by using the restartNetwork API with +cleanup=false parameter. For more information about redundant router +setup, see Creating a New Network Offering. + +For more information about the API syntax, see the API Reference at +`http://cloudstack.apache.org/docs/api/ <http://cloudstack.apache.org/docs/api/>`_. + +Maintenance mode not working on vCenter +--------------------------------------- + +Symptom +~~~~~~~ + +Host was placed in maintenance mode, but still appears live in vCenter. + +Cause +~~~~~~ + +The CloudStack administrator UI was used to place the host in scheduled +maintenance mode. This mode is separate from vCenter's maintenance mode. + +Solution +~~~~~~~~ + +Use vCenter to place the host in maintenance mode. + + +Unable to deploy VMs from uploaded vSphere template +--------------------------------------------------- + +Symptom +~~~~~~~~ + +When attempting to create a VM, the VM will not deploy. + +Cause +~~~~~ + +If the template was created by uploading an OVA file that was created +using vSphere Client, it is possible the OVA contained an ISO image. If +it does, the deployment of VMs from the template will fail. + +Solution +~~~~~~~~ + +Remove the ISO and re-upload the template. + +Unable to power on virtual machine on VMware +-------------------------------------------- + +Symptom +~~~~~~~ + +Virtual machine does not power on. You might see errors like: + +- + + Unable to open Swap File + +- + + Unable to access a file since it is locked + +- + + Unable to access Virtual machine configuration + +Cause +~~~~~ + +A known issue on VMware machines. ESX hosts lock certain critical +virtual machine files and file systems to prevent concurrent changes. +Sometimes the files are not unlocked when the virtual machine is powered +off. When a virtual machine attempts to power on, it can not access +these critical files, and the virtual machine is unable to power on. + +Solution +~~~~~~~~ + +See the following: + +`VMware Knowledge Base +Article <http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=10051/>`__ + +Load balancer rules fail after changing network offering +-------------------------------------------------------- + +Symptom +~~~~~~~ + +After changing the network offering on a network, load balancer rules +stop working. + +Cause +~~~~~ + +Load balancing rules were created while using a network service offering +that includes an external load balancer device such as NetScaler, and +later the network service offering changed to one that uses the +CloudStack virtual router. + +Solution +~~~~~~~~ + +Create a firewall rule on the virtual router for each of your existing +load balancing rules so that they continue to function. + +Troubleshooting Internet Traffic +-------------------------------- + +Below are a few troubleshooting steps to check whats going wrong with your +network... + +Trouble Shooting Steps +~~~~~~~~~~~~~~~~~~~~~~ + +#. The switches have to be configured correctly to pass VLAN traffic. You can + verify if VLAN traffic is working by bringing up a tagged interface on the + hosts and pinging between them as below... + + On *host1 (kvm1)* + + :: + + kvm1 ~$ vconfig add eth0 64 + kvm1 ~$ ifconfig eth0.64 1.2.3.4 netmask 255.255.255.0 up + kvm1 ~$ ping 1.2.3.5 + + On *host2 (kvm2)* + + :: + + kvm2 ~$ vconfig add eth0 64 + kvm2 ~$ ifconfig eth0.64 1.2.3.5 netmask 255.255.255.0 up + kvm2 ~$ ping 1.2.3.4 + + If the pings dont work, run *tcpdump(8)* all over the place to check + who is gobbling up the packets. Ultimately, if the switches are not + configured correctly, CloudStack networking wont work so fix the + physical networking issues before you proceed to the next steps + +#. Ensure `Traffic Labels <http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.2.0/html/Installation_Guide/about-physical-networks.html>`_ are set for the Zone. + + Traffic labels need to be set for all hypervisors including + XenServer, KVM and VMware types. You can configure traffic labels when + you creating a new zone from the *Add Zone Wizard*. + + .. image:: ../_static/images/networking-zone-traffic-labels.png + + On an existing zone, you can modify the traffic labels by going to + *Infrastructure, Zones, Physical Network* tab. + + .. image:: ../_static/images/networking-infra-traffic-labels.png + + List labels using *CloudMonkey* + + :: + + acs-manager ~$ cloudmonkey list traffictypes physicalnetworkid=41cb7ff6-8eb2-4630-b577-1da25e0e1145 + count = 4 + traffictype: + id = cd0915fe-a660-4a82-9df7-34aebf90003e + kvmnetworklabel = cloudbr0 + physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 + traffictype = Guest + xennetworklabel = MGMT + ======================================================== + id = f5524b8f-6605-41e4-a982-81a356b2a196 + kvmnetworklabel = cloudbr0 + physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 + traffictype = Management + xennetworklabel = MGMT + ======================================================== + id = 266bad0e-7b68-4242-b3ad-f59739346cfd + kvmnetworklabel = cloudbr0 + physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 + traffictype = Public + xennetworklabel = MGMT + ======================================================== + id = a2baad4f-7ce7-45a8-9caf-a0b9240adf04 + kvmnetworklabel = cloudbr0 + physicalnetworkid = 41cb7ff6-8eb2-4630-b577-1da25e0e1145 + traffictype = Storage + xennetworklabel = MGMT + ========================================================= + +#. KVM traffic labels require to be named as *"cloudbr0"*, *"cloudbr2"*, + *"cloudbrN"* etc and the corresponding bridge must exist on the KVM + hosts. If you create labels/bridges with any other names, CloudStack + (atleast earlier versions did) seems to ignore them. CloudStack does not + create the physical bridges on the KVM hosts, you need to create them + **before** before adding the host to Cloudstack. + + :: + + kvm1 ~$ ifconfig cloudbr0 + cloudbr0 Link encap:Ethernet HWaddr 00:0C:29:EF:7D:78 + inet addr:192.168.44.22 Bcast:192.168.44.255 Mask:255.255.255.0 + inet6 addr: fe80::20c:29ff:feef:7d78/64 Scope:Link + UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 + RX packets:92435 errors:0 dropped:0 overruns:0 frame:0 + TX packets:50596 errors:0 dropped:0 overruns:0 carrier:0 + collisions:0 txqueuelen:0 + RX bytes:94985932 (90.5 MiB) TX bytes:61635793 (58.7 MiB) + +#. The Virtual Router, SSVM, CPVM *public* interface would be bridged to + a physical interface on the host. In the example below, *cloudbr0* is + the public interface and CloudStack has correctly created the virtual + interfaces bridge. This virtual interface to physical interface mapping + is done automatically by CloudStack using the traffic label settings for + the Zone. If you have provided correct settings and still dont have a + working working Internet, check the switching layer before you debug any + further. You can verify traffic using tcpdump on the virtual, physical + and bridge interfaces. + + :: + + kvm-host1 ~$ brctl show + bridge name bridge id STP enabled interfaces + breth0-64 8000.000c29ef7d78 no eth0.64 + vnet2 + cloud0 8000.fe00a9fe0219 no vnet0 + cloudbr0 8000.000c29ef7d78 no eth0 + vnet1 + vnet3 + virbr0 8000.5254008e321a yes virbr0-nic + + :: + + xenserver1 ~$ brctl show + bridge name bridge id STP enabled interfaces + xapi0 0000.e2b76d0a1149 no vif1.0 + xenbr0 0000.000c299b54dc no eth0 + xapi1 + vif1.1 + vif1.2 + +#. Pre-create labels on the XenServer Hosts. Similar to KVM bridge + setup, traffic labels must also be pre-created on the XenServer hosts + before adding them to CloudStack. + + :: + + xenserver1 ~$ xe network-list + uuid ( RO) : aaa-bbb-ccc-ddd + name-label ( RW): MGMT + name-description ( RW): + bridge ( RO): xenbr0 + + +#. The Internet would be accessible from both the SSVM and CPVM + instances by default. Their public IPs will also be directly pingable + from the Internet. Please note that these test would work only if your + switches and traffic labels are configured correctly for your + environment. If your SSVM/CPVM cant reach the Internet, its very + unlikely that the Virtual Router (VR) can also the reach the Internet + suggesting that its either a switching issue or incorrectly assigned + traffic labels. Fix the SSVM/CPVM issues before you debug VR issues. + + :: + + root@s-1-VM:~# ping -c 3 google.com + PING google.com (74.125.236.164): 56 data bytes + 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=26.932 ms + 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=29.156 ms + 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=25.000 ms + --- google.com ping statistics --- + 3 packets transmitted, 3 packets received, 0% packet loss + round-trip min/avg/max/stddev = 25.000/27.029/29.156/1.698 ms + + :: + + root@v-2-VM:~# ping -c 3 google.com + PING google.com (74.125.236.164): 56 data bytes + 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=32.125 ms + 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=26.324 ms + 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=37.001 ms + --- google.com ping statistics --- + 3 packets transmitted, 3 packets received, 0% packet loss + round-trip min/avg/max/stddev = 26.324/31.817/37.001/4.364 ms + + +#. The Virtual Router (VR) should also be able to reach the Internet + without having any Egress rules. The Egress rules only control forwarded + traffic and not traffic that originates on the VR itself. + + :: + + root@r-4-VM:~# ping -c 3 google.com + PING google.com (74.125.236.164): 56 data bytes + 64 bytes from 74.125.236.164: icmp_seq=0 ttl=55 time=28.098 ms + 64 bytes from 74.125.236.164: icmp_seq=1 ttl=55 time=34.785 ms + 64 bytes from 74.125.236.164: icmp_seq=2 ttl=55 time=69.179 ms + --- google.com ping statistics --- + 3 packets transmitted, 3 packets received, 0% packet loss + round-trip min/avg/max/stddev = 28.098/44.021/69.179/17.998 ms + +#. However, the Virtual Router's (VR) Source NAT Public IP address + **WONT** be reachable until appropriate Ingress rules are + in place. You can add *Ingress* rules under *Network, Guest Network, IP + Address, Firewall* setting page. + + .. image:: ../_static/images/networking-ingress-rule.png + +#. The VM Instances by default wont be able to access the Internet. Add + Egress rules to permit traffic. + + .. image:: ../_static/images/networking-egress-rule.png + +#. Some users have reported that flushing IPTables rules (or changing + routes) on the SSVM, CPVM or the Virtual Router makes the Internet work. + This is not expected behaviour and suggests that your networking + settings are incorrect. No IPtables/route changes are required on the + SSVM, CPVM or the VR. Go back and double check all your settings. + + +In a vast majority of the cases, the problem has turned out to be at the +switching layer where the L3 switches were configured incorrectly. + +This section was contibuted by Shanker Balan and was originally published on `Shapeblue's blog <http://shankerbalan.net/blog/internet-not-working-on-cloudstack-vms/>`_ +