Sorry, I messed up the mail body, just resend it again.
OVS generates a dp flow to drop the loopback packet by defaut,shown as below,
recirc_id(0),in_port(3),eth(src=b0:00:b4:67:17:9b,dst=b0:00:b4:67:17:9b),eth_type(0x9000),
packets:5, bytes:300, used:0.476s, actions:drop
port(3) is the port which connected by the physical nic.
Is it possible that the loopback packet can be out from ovs switch, i.e., out
from port(3) in this case?
We also deployed about 500 kvm host with ovs vlan network, this problem
happpened too.
ovs overview in one of the hosts shown as below,
$ ovs-vsctl show
248fd0b2-6c73-4a5a-b5c6-ec4af56fe569
Bridge br-int
fail_mode: secure
Port "tapa74a924e-e5"
tag: 1
Interface "tapa74a924e-e5"
Port br-int
Interface br-int
type: internal
Port "tapfe684b50-b0"
tag: 1
Interface "tapfe684b50-b0"
Port "int-br100"
Interface "int-br100"
type: patch
options: {peer="phy-br100"}
Port "tap8d7dcd11-86"
tag: 1
Interface "tap8d7dcd11-86"
Port "tapaac394c0-ae"
tag: 1
Interface "tapaac394c0-ae"
Bridge "br100"
Port "enp2s0"
Interface "enp2s0"
Port "br100"
Interface "br100"
type: internal
Port "phy-br100"
Interface "phy-br100"
type: patch
options: {peer="int-br100"}
ovs_version: "2.4.0"
Now, I plan to do below jobs to debug this problem,
1) tcpdump on enp2s0 to capture the loopback packet, tcpdump command
tcpdump -i enp2s0 -e -nn "not ip and not arp" -w /home/enp2s0.pcap
2) set port mirror for select_src_port of port enp2s0 in br100, and veth0
connecting to the mirror port,
then tcpdump on veth0, tcpdump command
tcpdump -i veth0 -e -nn "not ip and not arp" -w /home/veth0.pcap
3) set port mirror for select_dst_port of port enp2s0 in br100, and veth1
connecting to the mirror port,
then tcpdump on veth0, tcpdump command
tcpdump -i veth0 -e -nn "not ip and not arp" -w /home/veth1.pcap
When the physical switch port was set to err-disabled, means that it received
the loopback packet returned from host,
then checking enp2s0.pcap, veth0.pcap, veth1.pcap.
If the loopback packet found in veth1.pcap and veth0.pcap, we can conclude that
this problem is caused by ovs,
if the loopback packet not found in veth1.pcap, but found in enp2s0.pcap, we
can conclude that
this problem is caused by physical nic or nic driver, even kernel.
Am I right?
I also encounter the same problem as Liang Dong did.
http://thread.gmane.org/gmane.linux.network.openvswitch.general/11704
Then I can give you the same response.
It's really hard to debug problems that are intermittent and require
specific hardware. If you can eliminate one of those parts of the
problem, then it's easier to deal with. To attack the intermittent
part, perhaps you could make the Cisco switch send these keepalive
messages much more frequently. To attack the specific hardware part,
maybe you could reproduce this by sending similar keepalive messages in
software and demonstrate that sometimes OVS sends them back.
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev