Hi,

I have this SR-IOV VM packet drop problems in my lab test which I am not
sure where to dig into and where to ask the question, hope this is the
right mailing list to ask or any direction would be appreciated.

Here is my lab detail

Dell server:

Poweredge R710 +  Intel 82599ES 10-Gigabit SFI/SFP+(dual port) + Ubuntu
16.04


SR-IOV enabled:

10: enp4s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether e8:ea:6a:06:1b:1a brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 4a:d0:eb:c8:76:ea, spoof checking on, link-state auto
    vf 1 MAC 52:54:00:eb:39:4a, spoof checking on, link-state auto
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
11: enp4s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
mode DEFAULT group default qlen 1000
    link/ether e8:ea:6a:06:1b:1b brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 6e:8c:99:84:e2:80, spoof checking on, link-state auto
    vf 1 MAC 52:54:00:6a:d4:05, spoof checking on, link-state auto
    vf 2 MAC 00:00:00:00:00:00, spoof checking on, link-state auto
    vf 3 MAC 00:00:00:00:00:00, spoof checking on, link-state auto

three VMs provisioned with SR-IOV VF

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 1     bigip-sriov                   running   <====F5 BIGIP VE
 2     sriov-enp4s0f0-vf1        running    <====Ubuntu 14.04 VM runs
iperf3 client
 3     sriov-enp4s0f1-vf1        running    <====Ubuntu 14.04 VM runs
iperf3 server

the VF assignment to guest is

BIGIP VE: enp4s0f0 vf0, enp4s0f1 vf0

Ubuntu VM iperf3 client:  enp4s0f0  vf1

Ubuntu VM iperf3 server: enp4s0f1 vf1

the packet path is below, BIGIP VE simply forward the packet

iperf3 client <------> BIGIP VE<----->-iperf3 server


here is iperf3 retransmission and sporadic packet drop:

# ./iperf3 -c 10.2.72.66
Connecting to host 10.2.72.66, port 5201
[  4] local 10.1.72.16 port 44701 connected to 10.2.72.66 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   109 MBytes   912 Mbits/sec    0   1.34 MBytes
[  4]   1.00-2.00   sec   110 MBytes   924 Mbits/sec    0   1.34 MBytes
[  4]   2.00-3.00   sec   109 MBytes   915 Mbits/sec    0   1.34 MBytes
[  4]   3.00-4.00   sec   108 MBytes   906 Mbits/sec  237    959 KBytes
 <====Re-transmission
[  4]   4.00-5.00   sec   116 MBytes   970 Mbits/sec    0   1.05 MBytes
[  4]   5.00-6.00   sec   114 MBytes   955 Mbits/sec    0   1.15 MBytes
[  4]   6.00-7.00   sec   112 MBytes   940 Mbits/sec    0   1.22 MBytes
[  4]   7.00-8.00   sec   110 MBytes   924 Mbits/sec    0   1.27 MBytes
[  4]   8.00-9.00   sec   109 MBytes   915 Mbits/sec    0   1.30 MBytes
[  4]   9.00-10.00  sec   109 MBytes   915 Mbits/sec    0   1.32 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval                    Transfer     Bandwidth               Retr
[  4]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec         237
  sender  <====Re-transmission here
[  4]   0.00-10.00  sec  1.08 GBytes   927 Mbits/sec
     receiver

if I move iperf3 VM server to a separate physical server, no more
re-transmission and zero packet drop

so this leads me to think there seems to some sporadic resource contentions
here when running multiple SR-IOV VMs on same hypervisor that could cause
network packet drop under high throughput test.

I even tried to pin the vcpu core to physical cores with 'virsh vcpupin' to
avoid context switch and tried to allocate physical cores to VM from same
NUMA node,  but I see no difference.

# virsh nodeinfo
CPU model:           x86_64
CPU(s):              16
CPU frequency:       2394 MHz
CPU socket(s):       1
Core(s) per socket:  4
Thread(s) per core:  2
NUMA cell(s):        2
Memory size:         74287756 KiB

# virsh capabilites

    <topology>
      <cells num='2'>
        <cell id='0'>
          <memory unit='KiB'>41285380</memory>
          <pages unit='KiB' size='4'>9797057</pages>
          <pages unit='KiB' size='2048'>1024</pages>
          <pages unit='KiB' size='1048576'>0</pages>
          <distances>
            <sibling id='0' value='10'/>
            <sibling id='1' value='20'/>
          </distances>
          <cpus num='8'>
            <cpu id='0' socket_id='1' core_id='0' siblings='0,8'/>
            <cpu id='2' socket_id='1' core_id='1' siblings='2,10'/>
            <cpu id='4' socket_id='1' core_id='9' siblings='4,12'/>
            <cpu id='6' socket_id='1' core_id='10' siblings='6,14'/>
            <cpu id='8' socket_id='1' core_id='0' siblings='0,8'/>
            <cpu id='10' socket_id='1' core_id='1' siblings='2,10'/>
            <cpu id='12' socket_id='1' core_id='9' siblings='4,12'/>
            <cpu id='14' socket_id='1' core_id='10' siblings='6,14'/>
          </cpus>
        </cell>
        <cell id='1'>
          <memory unit='KiB'>33002376</memory>
          <pages unit='KiB' size='4'>7726306</pages>
          <pages unit='KiB' size='2048'>1024</pages>
          <pages unit='KiB' size='1048576'>0</pages>
          <distances>
            <sibling id='0' value='20'/>
            <sibling id='1' value='10'/>
          </distances>
          <cpus num='8'>
            <cpu id='1' socket_id='0' core_id='0' siblings='1,9'/>
            <cpu id='3' socket_id='0' core_id='1' siblings='3,11'/>
            <cpu id='5' socket_id='0' core_id='9' siblings='5,13'/>
            <cpu id='7' socket_id='0' core_id='10' siblings='7,15'/>
            <cpu id='9' socket_id='0' core_id='0' siblings='1,9'/>
            <cpu id='11' socket_id='0' core_id='1' siblings='3,11'/>
            <cpu id='13' socket_id='0' core_id='9' siblings='5,13'/>
            <cpu id='15' socket_id='0' core_id='10' siblings='7,15'/>
          </cpus>
        </cell>
      </cells>
    </topology>


# virsh vcpuinfo bigip-sriov
VCPU:           0
CPU:            0
State:          running
CPU time:       249177.5s
CPU Affinity:   y---------------

VCPU:           1
CPU:            2
State:          running
CPU time:       195544.8s
CPU Affinity:   --y-------------

VCPU:           2
CPU:            4
State:          running
CPU time:       195515.5s
CPU Affinity:   ----y-----------

VCPU:           3
CPU:            6
State:          running
CPU time:       194525.0s
CPU Affinity:   ------y---------

# virsh vcpuinfo sriov-enp4s0f0-vf1

VCPU:           0
CPU:            1
State:          running
CPU time:       4823.9s
CPU Affinity:   -y--------------

VCPU:           1
CPU:            3
State:          running
CPU time:       4015.0s
CPU Affinity:   ---y------------

VCPU:           2
CPU:            5
State:          running
CPU time:       3732.3s
CPU Affinity:   -----y----------

VCPU:           3
CPU:            7
State:          running
CPU time:       3605.6s
CPU Affinity:   -------y--------


# virsh vcpuinfo sriov-enp4s0f1-vf1
VCPU:           0
CPU:            9
State:          running
CPU time:       2501.8s
CPU Affinity:   ---------y------

VCPU:           1
CPU:            11
State:          running
CPU time:       2494.1s
CPU Affinity:   -----------y----

VCPU:           2
CPU:            13
State:          running
CPU time:       3974.2s
CPU Affinity:   -------------y--

VCPU:           3
CPU:            15
State:          running
CPU time:       2863.9s
CPU Affinity:   ---------------y

found SR-IOV research doc https://pdfs.semanticscholar.org/6678/
b17fc8758efea8d32c2d47f9924f8a0cdc6d.pdf, seems old.

could anyone hint me some direction where I should look into? let me know
if I can provide more VM details. I work for F5 networks and our internal
developer found no problem with BIGIP VE driver, I am thinking maybe this
is more in the area of  inter SR-IOV VM communication and VF <-->PF things.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to