I also encounter the same problem as Liang Dong did.
http://thread.gmane.org/gmane.linux.network.openvswitch.general/11704


Just copy the message body from Liang Dong.


Hi


We have found a very strange bug in Open vSwitch, when it is connected to a 
Cisco Switch port, the port will randomly get err-disabled.


So we have 76 Debian servers installed with Open vSwitch (2.4.0), each 
connected an port in Cisco Switch 3110. There will be a chance of err-disabled 
port on Cisco Switch every week or two. From Cisco switch perspective, the port 
was disabled because detecting an loopback by receiving a keepalive message 
which was originated from the cisco switch port.


Basically the keepalive message was like below:


11:37:01.749102 e8:04:62:c8:6e:81  e8:04:62:c8:6e:81, ethertype Loopback 
(0x9000), length 60: Loopback, skipCount 0, Reply, receipt number 0, data (40 
octets)
0x0000: 0000 0100 0000 0000 0000 0000 0000 0000 ................
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0020: 0000 0000 0000 0000 0000 0000 0000    ..............


Our first guess was that Open vSwitch accidentally sends the keepalive message 
it received back to the port and leads to err-disabled state. Normally the Open 
vSwitch will discard this message, but once a week or two in 76 servers, it 
will get back to the port on the cisco switch and the port will be err-disabled.


The work around we are using now are either disabling sending keepalive message 
on cisco switch or explicitly add a flow rule for discarding that keepalive 
message on Open vSwitch.


The Open vSwitch version is:
ovs-vswitchd (Open vSwitch) 2.4.0
Compiled Aug 31 2015 16:53:51


The configuration of the switch is:
  Bridge "acc_10064"
    Port "acc_10064"
      Interface "acc_10064"
        type: internal
    Port "vxnet2"
      Interface "vxnet2"
    Port "10064_88ad7aaa"
      Interface "10064_88ad7aaa-02"
        type: vxlan
        options: {key="10064", local_ip="IP1", remote_ip="IP2"}
      Interface "10064_88ad7aaa-01"
        type: vxlan
        options: {key="10064", local_ip="IP1", remote_ip="IP3"}
  Bridge "acc_10050"
    Port "10050_0977455a"
      Interface "10050_0977455a-01"
        type: vxlan
        options: {key="10050", local_ip="IP1", remote_ip="IP4"}
      Interface "10050_0977455a-02"
        type: vxlan
        options: {key="10050", local_ip="IP1", remote_ip="IP5"}
    Port "vxnet0"
      Interface "vxnet0"
    Port "acc_10050"
      Interface "acc_10050"
        type: internal
    Port "vxnet1"
      Interface "vxnet1"
  Bridge "br0"
    Port "eth0"
      Interface "eth0"
    Port "br0"
      Interface "br0"
        type: internal
  ovs_version: "2.4.0"


The kernel version is:
Linux version 3.16.0-4-amd64 
(debian-kernel-0aAXYlwwYIKrKVvWRXNRGw@public.gmane.orgorg) (gcc version 4.8.4 
(Debian 4.8.4-1) ) #1 SMP Debian 3.16.7-ckt11-1+deb8u3 (2015-08-04)


The ovs-dpctl show output is:
system at ovs-system:
lookups: hit:536177037 missed:17196786 lost:0
flows: 182
masks: hit:1130706939 total:9 hit/pkt:2.04
port 0: ovs-system (internal)
port 1: acc_10050 (internal)
port 2: vxlan_sys_4789 (vxlan)
port 3: eth0
port 4: br0 (internal)
port 5: vxnet0
port 6: vxnet1
port 7: acc_10064 (internal)
port 8: vxnet2


The Open vSwitch does not have a controller connected and it is configured as 
normal L2 switch.


We have found some similar case on google but unanswered:
https://forums.gentoo.org/viewtopic-p-7884924.html?sid=12abe544bda8782c840fa5c70df6e65e


Any ideas?


Thanks,
Zhang Haoyu
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to