Thank you very much! I upgraded to the latest kernel 2.6.32-504.16.2.el6.x86_64 for 200 servers and so far, this problem(the ovs-vswitchd hang) has not happened. I seems to be related to this bug(http://bugs.centos.org/view.php?id=8703). But, I'm not sure!
From: Joe Stringer Date: 2015-05-28 09:55 To: yangwanli CC: bugs Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll interval (365882ms user, 464906ms system) >Hmm. I have a few of different paths that might help: > 1) Can you get dmesg and/or syslog around the time of the event? Not find errors or warning in dmesg and syslog about ovs-vswitchd. > 2) Does ovs-vswitchd crash or hang? Hang, I think so! Because when I excute netstat -anp, I find some unix sockets state is CONNECTING. ovs-ofctl dump-flows br-int, not always reponse. Recently, I found the ovs-vswitchd process of a server later changed to normal, about 20 minutes. I do nothing! ovs-ofctl dump-flows br-int is OK! > A backtrace of the code that ovs-vswitchd is executing would be really > useful. If it crashes, can you get a core dump and use gdb to get a > backtrace? If it hangs, perhaps you could attach gdb and print a > backtrace and post that? > 3) If you're already compiling your own OVS-2.3, you could try > downloading the latest branch-2.3 from here: > https://github.com/openvswitch/ovs/tree/branch-2.3 > There are a few bugfixes that have not made it into a new release yet. OK! I will try it. 4) Is OVS running on the bare metal or is it running within a VM? bare metal > 5) Lastly, it's interesting that there is a delay of around ~2.5 hours > (or 10,000 seconds) between the "unreasonably long poll interval" > messages. Maybe this means something, maybe it's irrelevant. On 26 May 2015 at 18:46, yangwanli <[email protected]> wrote: > I don't enable STP. > > #ovs-vsctl show > 350424e8-f03c-4f5c-aaa9-fd78ec78f97b > Bridge br-int > fail_mode: secure > Port "int-br-eth1" > Interface "int-br-eth1" > type: patch > options: {peer="phy-br-eth1"} > Port "tap5c0d4002-26" > tag: 1 > Interface "tap5c0d4002-26" > Port "tap4954ce53-06" > tag: 3 > Interface "tap4954ce53-06" > Port "tap6631c08d-91" > tag: 3 > Interface "tap6631c08d-91" > Port "tapdfd3962f-3a" > tag: 2 > Interface "tapdfd3962f-3a" > Port br-int > Interface br-int > type: internal > Port "tapb2cd1333-03" > tag: 1 > Interface "tapb2cd1333-03" > Port "tap0c1153d7-bb" > tag: 1 > Interface "tap0c1153d7-bb" > Bridge "br-eth1" > Port "br-eth1" > Interface "br-eth1" > type: internal > Port "eth3" > Interface "eth3" > Port "phy-br-eth1" > Interface "phy-br-eth1" > type: patch > options: {peer="int-br-eth1"} > ovs_version: "2.3.1" > > > eth3 is physical NIC. > The bidge of OVS is created by: > ovs-vsctl add-br br-int > ovs-vsctl add-br br-eth1 > ovs-vsctl --timeout=10 -- set-fail-mode br-int secure > > ________________________________ > > WanLi Yang > > JD.COM > > ***************************************************************** > > > From: Joe Stringer > Date: 2015-05-27 05:05 > To: yangwanli > CC: bugs > Subject: Re: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll > interval (365882ms user, 464906ms system) > Are you using STP? > > On 22 May 2015 at 21:54, yangwanli <[email protected]> wrote: >> Thank you for your response! >> >> My servers OS are all CentOS 6.6, kernel version: 2.6.32-504.el6.x86_64, >> vCPU:32, RAM:256G >> #lscpu >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Byte Order: Little Endian >> CPU(s): 32 >> On-line CPU(s) list: 0-31 >> Thread(s) per core: 2 >> Core(s) per socket: 8 >> Socket(s): 2 >> NUMA node(s): 2 >> Vendor ID: GenuineIntel >> CPU family: 6 >> Model: 63 >> Stepping: 2 >> CPU MHz: 2599.971 >> BogoMIPS: 5199.24 >> Virtualization: VT-x >> L1d cache: 32K >> L1i cache: 32K >> L2 cache: 256K >> L3 cache: 20480K >> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30 >> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31 >> >> OpenvSwitch kernel module(openvswitch.ko) and user space utilities are all >> build I use OpenvSwitch-2.3.1 source (change nothing ) >> (http://openvswitch.org/releases/openvswitch-2.3.1.tar.gz) >> #modinfo openvswitch >> /lib/modules/2.6.32-504.el6.x86_64/kernel/net/openvswitch/openvswitch.ko >> version: 2.3.1 >> license: GPL >> description: Open vSwitch switching datapath >> srcversion: 4D7CD38A83A9A4A782F73A1 >> depends: libcrc32c >> vermagic: 2.6.32-504.el6.x86_64 SMP mod_unload modversions >> parm: vlan_tso:Enable TSO for VLAN packets (int) >> >> I have 300 servers are all the same hardware , OS and OpenvSwitch-2.3.1. >> Since May 15, 2015, There are 6 servers appear this problem that when I >> execute ovs-ofctl dump-flows br-int , not always reponse, >> OVS ports not transmit datas, led to the virtual machines cannot be >> accessed. >> When appear this problem , server load is not high, and while I excute >> netstat -anp , find some CONNECTING state of unix socket. >> I restart ovs-vswitchd and ovdb-server, becames normal. >> I doubt the thread dealock of ovs-vswitchd. >> >> All servers have the following log info: >> /var/log/openvswitch/ovs-vswitd.log: >> 2015-05-18T12:13:04.491Z|00138|coverage|INFO|Event coverage, avg rate over >> last: 5 seconds, last minute, last hour, hash=9e13ffc8: >> 2015-05-18T12:13:04.491Z|00139|coverage|INFO|nln_changed >> 0.0/sec 0.000/sec 0.0000/sec total: 21 >> 2015-05-18T12:13:04.491Z|00140|coverage|INFO|netlink_received >> 59.4/sec 53.850/sec 53.6072/sec total: 1159232 >> 2015-05-18T12:13:04.491Z|00141|coverage|INFO|netlink_recv_jumbo >> 3.8/sec 2.767/sec 2.6536/sec total: 53239 >> 2015-05-18T12:13:04.491Z|00142|coverage|INFO|netlink_sent >> 68.0/sec 62.783/sec 62.9394/sec total: 1353375 >> 2015-05-18T12:13:04.491Z|00143|coverage|INFO|netdev_set_policing >> 0.0/sec 0.000/sec 0.0000/sec total: 12 >> 2015-05-18T12:13:04.491Z|00144|coverage|INFO|netdev_get_ifindex >> 0.0/sec 0.000/sec 0.0000/sec total: 6 >> 2015-05-18T12:13:04.491Z|00145|coverage|INFO|netdev_get_hwaddr >> 0.0/sec 0.000/sec 0.0000/sec total: 6 >> 2015-05-18T12:13:04.491Z|00146|coverage|INFO|netdev_set_hwaddr >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> 2015-05-18T12:13:04.491Z|00147|coverage|INFO|netdev_get_ethtool >> 0.0/sec 0.000/sec 0.0000/sec total: 22 >> 2015-05-18T12:13:04.491Z|00148|coverage|INFO|netdev_set_ethtool >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> 2015-05-18T12:13:04.491Z|00149|coverage|INFO|vconn_received >> 2.4/sec 2.000/sec 1.9067/sec total: 41642 >> 2015-05-18T12:13:04.491Z|00150|coverage|INFO|vconn_sent >> 1.8/sec 1.500/sec 1.4300/sec total: 31228 >> 2015-05-18T12:13:04.491Z|00151|coverage|INFO|util_xalloc >> 1987.2/sec 1867.150/sec 1817.8008/sec total: 37808073 >> 2015-05-18T12:13:04.491Z|00152|coverage|INFO|pstream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 5 >> 2015-05-18T12:13:04.491Z|00153|coverage|INFO|stream_open >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> 2015-05-18T12:13:04.491Z|00154|coverage|INFO|rconn_queued >> 1.2/sec 1.000/sec 0.9533/sec total: 20815 >> 2015-05-18T12:13:04.491Z|00155|coverage|INFO|rconn_sent >> 1.2/sec 1.000/sec 0.9533/sec total: 20815 >> 2015-05-18T12:13:04.491Z|00156|coverage|INFO|poll_fd_wait >> 537.2/sec 481.217/sec 462.1683/sec total: 9762985 >> 2015-05-18T12:13:04.491Z|00157|coverage|INFO|poll_zero_timeout >> 5.2/sec 4.333/sec 3.8581/sec total: 83823 >> 2015-05-18T12:13:04.491Z|00158|coverage|INFO|txn_unchanged >> 0.0/sec 0.000/sec 0.0000/sec total: 11 >> 2015-05-18T12:13:04.491Z|00159|coverage|INFO|txn_incomplete >> 8.8/sec 8.450/sec 8.3044/sec total: 170811 >> 2015-05-18T12:13:04.491Z|00160|coverage|INFO|txn_success >> 0.2/sec 0.200/sec 0.1989/sec total: 4276 >> 2015-05-18T12:13:04.491Z|00161|coverage|INFO|netdev_get_stats >> 1.6/sec 1.600/sec 1.5911/sec total: 32833 >> 2015-05-18T12:13:04.491Z|00162|coverage|INFO|mac_learning_learned >> 0.2/sec 0.133/sec 0.0686/sec total: 1595 >> 2015-05-18T12:13:04.491Z|00163|coverage|INFO|mac_learning_expired >> 0.4/sec 0.150/sec 0.0689/sec total: 1518 >> 2015-05-18T12:13:04.491Z|00164|coverage|INFO|hmap_pathological >> 0.0/sec 0.000/sec 0.0000/sec total: 27 >> 2015-05-18T12:13:04.491Z|00165|coverage|INFO|hmap_expand >> 102.4/sec 92.300/sec 88.6772/sec total: 1853939 >> 2015-05-18T12:13:04.491Z|00166|coverage|INFO|flow_extract >> 23.0/sec 19.883/sec 19.7067/sec total: 416747 >> 2015-05-18T12:13:04.491Z|00167|coverage|INFO|dpif_port_add >> 0.0/sec 0.000/sec 0.0000/sec total: 6 >> 2015-05-18T12:13:04.491Z|00168|coverage|INFO|dpif_flow_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 4 >> 2015-05-18T12:13:04.491Z|00169|coverage|INFO|dpif_flow_put >> 0.0/sec 0.000/sec 0.0000/sec total: 2 >> 2015-05-18T12:13:04.491Z|00170|coverage|INFO|dpif_flow_del >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> 2015-05-18T12:13:04.491Z|00171|coverage|INFO|dpif_execute >> 0.0/sec 0.000/sec 0.0000/sec total: 1 >> 2015-05-18T12:13:04.491Z|00172|coverage|INFO|xlate_actions >> 38.8/sec 27.317/sec 22.9731/sec total: 507271 >> 2015-05-18T12:13:04.491Z|00173|coverage|INFO|rev_reconfigure >> 0.0/sec 0.000/sec 0.0000/sec total: 20 >> 2015-05-18T12:13:04.491Z|00174|coverage|INFO|rev_port_toggled >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> 2015-05-18T12:13:04.491Z|00175|coverage|INFO|rev_flow_table >> 0.0/sec 0.000/sec 0.0000/sec total: 11 >> 2015-05-18T12:13:04.491Z|00176|coverage|INFO|rev_mac_learning >> 0.6/sec 0.283/sec 0.1286/sec total: 2813 >> 2015-05-18T12:13:04.491Z|00177|coverage|INFO|ofproto_flush >> 0.0/sec 0.000/sec 0.0000/sec total: 3 >> 2015-05-18T12:13:04.491Z|00178|coverage|INFO|ofproto_recv_openflow >> 1.8/sec 1.500/sec 1.4300/sec total: 31229 >> 2015-05-18T12:13:04.491Z|00179|coverage|INFO|ofproto_update_port >> 0.0/sec 0.000/sec 0.0000/sec total: 34 >> 2015-05-18T12:13:04.491Z|00180|coverage|INFO|bridge_reconfigure >> 0.0/sec 0.000/sec 0.0000/sec total: 15 >> 2015-05-18T12:13:04.491Z|00181|coverage|INFO|49 events never hit >> 2015-05-18T14:59:05.014Z|00182|timeval|WARN|Unreasonably long 2019ms poll >> interval (43448ms user, 50465ms system) >> 2015-05-18T14:59:05.014Z|00183|timeval|WARN|faults: 840 minor, 0 major >> 2015-05-18T14:59:05.014Z|00184|timeval|WARN|disk: 0 reads, 120 writes >> 2015-05-18T14:59:05.014Z|00185|timeval|WARN|context switches: 2167382 >> voluntary, 529 involuntary >> 2015-05-18T14:59:05.014Z|00186|coverage|INFO|Skipping details of duplicate >> event coverage for hash=9e13ffc8 >> 2015-05-18T17:46:50.578Z|00187|timeval|WARN|Unreasonably long 2018ms poll >> interval (56724ms user, 65745ms system) >> 2015-05-18T17:46:50.578Z|00188|timeval|WARN|faults: 841 minor, 0 major >> 2015-05-18T17:46:50.578Z|00189|timeval|WARN|disk: 0 reads, 128 writes >> 2015-05-18T17:46:50.578Z|00190|timeval|WARN|context switches: 2853218 >> voluntary, 538 involuntary >> 2015-05-18T17:46:50.578Z|00191|coverage|INFO|Skipping details of duplicate >> event coverage for hash=9e13ffc8 >> 2015-05-18T20:29:36.092Z|00192|timeval|WARN|Unreasonably long 2018ms poll >> interval (69561ms user, 80775ms system) >> 2015-05-18T20:29:36.092Z|00193|timeval|WARN|faults: 841 minor, 0 major >> 2015-05-18T20:29:36.092Z|00194|timeval|WARN|disk: 0 reads, 136 writes >> 2015-05-18T20:29:36.092Z|00195|timeval|WARN|context switches: 3519924 >> voluntary, 544 involuntary >> 2015-05-18T20:29:36.092Z|00196|coverage|INFO|Skipping details of duplicate >> event coverage for hash=9e13ffc8 >> 2015-05-18T23:21:51.729Z|00197|timeval|WARN|Unreasonably long 2021ms poll >> interval (83029ms user, 96728ms system) >> 2015-05-18T23:21:51.729Z|00198|timeval|WARN|faults: 841 minor, 0 major >> 2015-05-18T23:21:51.729Z|00199|timeval|WARN|disk: 0 reads, 144 writes >> 2015-05-18T23:21:51.729Z|00200|timeval|WARN|context switches: 4224452 >> voluntary, 553 involuntary >> 2015-05-18T23:21:51.729Z|00201|coverage|INFO|Skipping details of duplicate >> event coverage for hash=9e13ffc8 >> >> ________________________________ >> >> WanLi Yang >> >> JD.COM >> >> ***************************************************************** >> >> >> From: Joe Stringer >> Date: 2015-05-23 05:36 >> To: yangwanli >> CC: bugs >> Subject: Re: [ovs-discuss] |timeval|WARN|Unreasonably long 2020ms poll >> interval (365882ms user, 464906ms system) >> On 19 May 2015 at 03:32, yangwanli <[email protected]> wrote: >>> Hi, >>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all >>> virtual >>> machines and containers communicate with OVS. >>> I use OVS VLAN. >>> OpenvSwitch Version 2.3.1 (nothing change) >>> CentOS 6.6 >>> In the process of I use, encounter the flowing problems. >>> Need your help! >>> >>> 1、ovs-vswitchd.log always report warnings: >>> >>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll >>> interval (365882ms user, 464906ms system) >>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major >>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes >>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142 >>> voluntary, 2080 involuntary >>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of >>> duplicate >>> event coverage for hash=35d02982 >>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll >>> interval (380306ms user, 483259ms system) >>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major >>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes >>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734 >>> voluntary, 2174 involuntary >>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of >>> duplicate >>> event coverage for hash=35d02982 >> >> The main loop of ovs-vswitchd usually expects to cycle at least once >> per second. This message appears when this takes longer than usual, >> which may indicate that ovs-vswitchd is under heavy load, or some sort >> of bug. >> >>> >>> 2、I also encounter this problem: >>> when I execute >>> ovs-ofctl dump-flows br-int >>> not always reponse, but ovs-vsctl show is OK! >> >> The main thread of ovs-vswitchd assembles responses to OpenFlow >> requests, and given the above messages it sounds like it's busy doing >> other things. Ovs-vsctl on the other hand talks to ovsdb-server, which >> looks like it is OK in this scenario. >> >> These both seem like symptoms, but it's not obvious to me what the >> underlying cause is. Is there anything else in the logs? Is >> ovs-vswitchd using a lot of CPU? >> >> Are you using the OVS kernel module from CentOS or one distributed >> with the openvswitch package? ('modinfo openvswitch' should give a >> hint here) >> >> On 19 May 2015 at 03:32, yangwanli <[email protected]> wrote: >>> Hi, >>> I'm a cloud computing R&D engineer. Use OVS as virtual switch, all >>> virtual >>> machines and containers communicate with OVS. >>> I use OVS VLAN. >>> OpenvSwitch Version 2.3.1 (nothing change) >>> CentOS 6.6 >>> In the process of I use, encounter the flowing problems. >>> Need your help! >>> >>> 1、ovs-vswitchd.log always report warnings: >>> >>> 2015-05-17T00:52:50.463Z|00575|timeval|WARN|Unreasonably long 2020ms poll >>> interval (365882ms user, 464906ms system) >>> 2015-05-17T00:52:50.463Z|00576|timeval|WARN|faults: 708 minor, 0 major >>> 2015-05-17T00:52:50.463Z|00577|timeval|WARN|disk: 0 reads, 392 writes >>> 2015-05-17T00:52:50.463Z|00578|timeval|WARN|context switches: 19571142 >>> voluntary, 2080 involuntary >>> 2015-05-17T00:52:50.463Z|00579|coverage|INFO|Skipping details of >>> duplicate >>> event coverage for hash=35d02982 >>> 2015-05-17T03:26:36.336Z|00580|timeval|WARN|Unreasonably long 2020ms poll >>> interval (380306ms user, 483259ms system) >>> 2015-05-17T03:26:36.336Z|00581|timeval|WARN|faults: 708 minor, 0 major >>> 2015-05-17T03:26:36.336Z|00582|timeval|WARN|disk: 0 reads, 400 writes >>> 2015-05-17T03:26:36.336Z|00583|timeval|WARN|context switches: 20299734 >>> voluntary, 2174 involuntary >>> 2015-05-17T03:26:36.336Z|00584|coverage|INFO|Skipping details of >>> duplicate >>> event coverage for hash=35d02982 >>> >>> 2、I also encounter this problem: >>> when I execute >>> ovs-ofctl dump-flows br-int >>> not always reponse, but ovs-vsctl show is OK! >>> >>> Thanks! >>> WanLi Yang >>> >>> >>> ________________________________ >>> >>> WanLi Yang >>> >>> JD.COM >>> >>> ***************************************************************** >>> >>> >>> _______________________________________________ >>> discuss mailing list >>> [email protected] >>> http://openvswitch.org/mailman/listinfo/discuss >>>
_______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
