Hiļ¼Chas Williams
> -----Original Message----- > From: Chas Williams [mailto:3ch...@gmail.com] > Sent: Thursday, November 8, 2018 2:55 AM > To: Zhao1, Wei <wei.zh...@intel.com>; Luca Boccassi <bl...@debian.org>; > dev@dpdk.org > Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Ananyev, Konstantin > <konstantin.anan...@intel.com>; sta...@dpdk.org > Subject: Re: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt rate > > > > On 11/07/2018 04:17 AM, Zhao1, Wei wrote: > > Hi, Luca Boccassi > > > > The purpose of this patch is to reduce the mailbox interrupt from vf > > to pf, > but there seem some point need for discussion in this patch. > > > > First, I do not know why do you change code of function > > ixgbe_check_mac_link_vf(), because in rte_eth_link_get_nowait() and > rte_eth_link_get(), it will call ixgbe_dev_link_update()- > >ixgbe_dev_link_update_share()-> ixgbevf_check_link() for VF, NOT > ixgbe_check_mac_link_vf() in your patch! > > > > Second, in function ixgbevf_check_link(), there is mailbox message > > read operation for vf, " if (mbx->ops.read(hw, &in_msg, 1, 0))", that > > is ixgbe_read_mbx_vf() , This will cause interrupt from vf to pf, this is > > just > the point of this patch, it is also the problem that you want to solve. > > So, you use autoneg_wait_to_complete flag to control this mailbox > message read operation, maybe you will use rte_eth_link_get_nowait(), > Which set autoneg_wait_to_complete = 0, then the interrupt from vf to pf > can be reduced. > > > > But I do not think this patch is necessary, because in > > ixgbevf_check_link(), it,has > > I think you are right here. This patch dates to before the addition of the vf > argument to ixgbe_dev_link_update_share() and the split of .link_update > between ixgbe and ixgbevf. At one point, this patch was especially beneficial > if you were running bonding (which tends to make quite a few link status > checks). If you have other idea based on this patch, I am willing to review and ack. > > So this patch probably hasn't been helping at this point. I will try to get > some > time to locally test this. > > > " > > bool no_pflink_check = wait_to_complete == 0; > > > > //////////////////////// > > > > if (no_pflink_check) { > > if (*speed == IXGBE_LINK_SPEED_UNKNOWN) > > mac->get_link_status = > > true; > > else > > mac->get_link_status > > = false; > > > > goto out; > > } > > " > > Comment of "for a quick link status checking, wait_to_compelet == 0, skip > PF link status checking " is clear. > > > > That means in rte_eth_link_get_nowait(), code will skip this mailbox > > read interrupt, only in > > rte_eth_link_get() there will be this interrupt, so I think what you > > need to is just replace > > rte_eth_link_get() with rte_eth_link_get_nowait() in your APP, that > > will reduce interrupt from vf to pf in mailbox read. > > > > > >> -----Original Message----- > >> From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Luca Boccassi > >> Sent: Wednesday, August 15, 2018 10:15 PM > >> To: dev@dpdk.org > >> Cc: Lu, Wenzhuo <wenzhuo...@intel.com>; Ananyev, Konstantin > >> <konstantin.anan...@intel.com>; Luca Boccassi <bl...@debian.org>; > >> sta...@dpdk.org > >> Subject: [dpdk-dev] [PATCH] net/ixgbe: reduce PF mailbox interrupt > >> rate > >> > >> We have observed high rate of NIC PF interrupts when VNF is using > >> DPDK APIs rte_eth_link_get_nowait() and rte_eth_link_get() functions, > >> as they are causing VF driver to send many MBOX ACK messages. > >> > >> With these changes, the interrupt rates go down significantly. Here's > >> some testing results: > >> > >> Without the patch: > >> > >> $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f' > >> /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >> CPU6 CPU7 > >> CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 > >> CPU15 > >> 34: 88 0 0 0 0 41 > >> 30 509 0 0 > 350 > >> 24 88 114 461 562 PCI-MSI 1572864-edge > >> ens1f0-TxRx-0 > >> 35: 49 24 0 0 65 130 > >> 64 29 67 0 > 10 > >> 0 0 46 38 764 PCI-MSI 1572865-edge > >> ens1f0-TxRx-1 > >> 36: 53 0 0 64 15 85 > >> 132 71 108 0 > >> 30 0 165 215 303 104 PCI-MSI > >> 1572866-edge ens1f0- > >> TxRx-2 > >> 37: 46 196 0 0 10 48 > >> 62 68 51 0 > 0 > >> 0 103 82 54 192 PCI-MSI 1572867-edge > >> ens1f0-TxRx-3 > >> 38: 226 0 0 0 159 145 > >> 749 265 0 0 > >> 202 0 69229 166 450 0 PCI-MSI > >> 1572868-edge ens1f0 > >> 52: 95 896 0 0 0 18 > >> 53 0 494 0 > 0 > >> 0 0 265 79 124 PCI-MSI 1574912-edge > >> ens1f1-TxRx-0 > >> 53: 50 0 18 0 72 33 > >> 0 168 330 0 > 0 > >> 0 141 22 12 65 PCI-MSI 1574913-edge > >> ens1f1-TxRx-1 > >> 54: 65 0 0 0 239 104 > >> 166 49 442 0 > >> 0 0 126 26 307 0 PCI-MSI > >> 1574914-edge ens1f1- > TxRx-2 > >> 55: 57 0 0 0 123 35 > >> 83 54 157 106 > >> 0 0 26 29 312 97 PCI-MSI > >> 1574915-edge ens1f1- > TxRx-3 > >> 56: 232 0 13910 0 16 21 > >> 0 54422 0 0 > >> 0 24 25 0 78 0 PCI-MSI > >> 1574916-edge ens1f1 > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >> CPU6 CPU7 > >> CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 > >> CPU15 > >> 34: 88 0 0 0 0 41 > >> 30 509 0 0 > 350 > >> 24 88 119 461 562 PCI-MSI 1572864-edge > >> ens1f0-TxRx-0 > >> 35: 49 24 0 0 65 130 > >> 64 29 67 0 > 10 > >> 0 0 46 38 771 PCI-MSI 1572865-edge > >> ens1f0-TxRx-1 > >> 36: 53 0 0 64 15 85 > >> 132 71 108 0 > >> 30 0 165 215 303 113 PCI-MSI > >> 1572866-edge ens1f0- > >> TxRx-2 > >> 37: 46 196 0 0 10 48 > >> 62 68 56 0 > 0 > >> 0 103 82 54 192 PCI-MSI 1572867-edge > >> ens1f0-TxRx-3 > >> 38: 226 0 0 0 159 145 > >> 749 265 0 0 > >> 202 0 71281 166 450 0 PCI-MSI > >> 1572868-edge ens1f0 > >> 52: 95 896 0 0 0 18 > >> 53 0 494 0 > 0 > >> 0 0 265 79 133 PCI-MSI 1574912-edge > >> ens1f1-TxRx-0 > >> 53: 50 0 18 0 72 33 > >> 0 173 330 0 > 0 > >> 0 141 22 12 65 PCI-MSI 1574913-edge > >> ens1f1-TxRx-1 > >> 54: 65 0 0 0 239 104 > >> 166 49 442 0 > >> 0 0 126 26 312 0 PCI-MSI > >> 1574914-edge ens1f1- > TxRx-2 > >> 55: 57 0 0 0 123 35 > >> 83 59 157 106 > >> 0 0 26 29 312 97 PCI-MSI > >> 1574915-edge ens1f1- > TxRx-3 > >> 56: 232 0 15910 0 16 21 > >> 0 54422 0 0 > >> 0 24 25 0 78 0 PCI-MSI > >> 1574916-edge ens1f1 > >> > >> During the 10s interval, CPU2 jumped by 2000 interrupts, CPU12 by > >> 2051 interrupts, for about 200 interrupts/second. That's on the order > >> of what we expect. I would have guessed 100/s but perhaps there are > >> two mailbox messages. > >> > >> With the patch: > >> > >> $ egrep 'CPU|ens1f' /proc/interrupts ; sleep 10; egrep 'CPU|ens1f' > >> /proc/interrupts > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >> CPU6 CPU7 > >> CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 > >> CPU15 > >> 34: 88 0 0 0 0 25 > >> 19 177 0 0 > 350 > >> 24 88 100 362 559 PCI-MSI 1572864-edge > >> ens1f0-TxRx-0 > >> 35: 49 19 0 0 65 130 > >> 64 29 67 0 > 10 > >> 0 0 46 38 543 PCI-MSI 1572865-edge > >> ens1f0-TxRx-1 > >> 36: 53 0 0 64 15 53 > >> 85 71 108 0 > 24 > >> 0 85 215 292 31 PCI-MSI 1572866-edge > >> ens1f0-TxRx-2 > >> 37: 46 196 0 0 10 43 > >> 57 39 19 0 > 0 > >> 0 78 69 49 149 PCI-MSI 1572867-edge > >> ens1f0-TxRx-3 > >> 38: 226 0 0 0 159 145 > >> 749 247 0 0 > >> 202 0 58250 0 450 0 PCI-MSI > >> 1572868-edge ens1f0 > >> 52: 95 896 0 0 0 18 > >> 53 0 189 0 > 0 > >> 0 0 265 79 25 PCI-MSI 1574912-edge > >> ens1f1-TxRx-0 > >> 53: 50 0 18 0 72 33 > >> 0 90 330 0 > 0 > >> 0 136 5 12 0 PCI-MSI 1574913-edge > >> ens1f1-TxRx-1 > >> 54: 65 0 0 0 10 104 > >> 166 49 442 0 > 0 > >> 0 126 26 226 0 PCI-MSI 1574914-edge > >> ens1f1-TxRx-2 > >> 55: 57 0 0 0 61 35 > >> 83 30 157 101 > 0 > >> 0 26 15 312 0 PCI-MSI 1574915-edge > >> ens1f1-TxRx-3 > >> 56: 232 0 2062 0 16 21 > >> 0 54422 0 0 > >> 0 24 25 0 78 0 PCI-MSI > >> 1574916-edge ens1f1 > >> CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 > >> CPU6 CPU7 > >> CPU8 CPU9 CPU10 CPU11 CPU12 CPU13 CPU14 > >> CPU15 > >> 34: 88 0 0 0 0 25 > >> 19 177 0 0 > 350 > >> 24 88 102 362 562 PCI-MSI 1572864-edge > >> ens1f0-TxRx-0 > >> 35: 49 19 0 0 65 130 > >> 64 29 67 0 > 10 > >> 0 0 46 38 548 PCI-MSI 1572865-edge > >> ens1f0-TxRx-1 > >> 36: 53 0 0 64 15 53 > >> 85 71 108 0 > 24 > >> 0 85 215 292 36 PCI-MSI 1572866-edge > >> ens1f0-TxRx-2 > >> 37: 46 196 0 0 10 45 > >> 57 39 19 0 > 0 > >> 0 78 69 49 152 PCI-MSI 1572867-edge > >> ens1f0-TxRx-3 > >> 38: 226 0 0 0 159 145 > >> 749 247 0 0 > >> 202 0 58259 0 450 0 PCI-MSI > >> 1572868-edge ens1f0 > >> 52: 95 896 0 0 0 18 > >> 53 0 194 0 > 0 > >> 0 0 265 79 25 PCI-MSI 1574912-edge > >> ens1f1-TxRx-0 > >> 53: 50 0 18 0 72 33 > >> 0 95 330 0 > 0 > >> 0 136 5 12 0 PCI-MSI 1574913-edge > >> ens1f1-TxRx-1 > >> 54: 65 0 0 0 10 104 > >> 166 49 442 0 > 0 > >> 0 126 26 231 0 PCI-MSI 1574914-edge > >> ens1f1-TxRx-2 > >> 55: 57 0 0 0 66 35 > >> 83 30 157 101 > 0 > >> 0 26 15 312 0 PCI-MSI 1574915-edge > >> ens1f1-TxRx-3 > >> 56: 232 0 2071 0 16 21 > >> 0 54422 0 0 > >> 0 24 25 0 78 0 PCI-MSI > >> 1574916-edge ens1f1 > >> > >> Note the interrupt rate has gone way down. During the 10s interval, > >> we only saw a handful of interrupts. > >> > >> Note that this patch was originally provided by Intel directly to > >> AT&T and Vyatta, but unfortunately I am unable to find records of the > exact author. > >> > >> We have been using this in production for more than a year. > >> > >> Fixes: af75078fece3 ("first public release") > >> Cc: sta...@dpdk.org > >> > >> Signed-off-by: Luca Boccassi <bl...@debian.org> > >> --- > >> drivers/net/ixgbe/base/ixgbe_vf.c | 33 ++++++++++++++++--------------- > >> 1 file changed, 17 insertions(+), 16 deletions(-) > >> > >> diff --git a/drivers/net/ixgbe/base/ixgbe_vf.c > >> b/drivers/net/ixgbe/base/ixgbe_vf.c > >> index 5b25a6b4d4..16086670b1 100644 > >> --- a/drivers/net/ixgbe/base/ixgbe_vf.c > >> +++ b/drivers/net/ixgbe/base/ixgbe_vf.c > >> @@ -586,7 +586,6 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw > *hw, > >> ixgbe_link_speed *speed, > >> s32 ret_val = IXGBE_SUCCESS; > >> u32 links_reg; > >> u32 in_msg = 0; > >> - UNREFERENCED_1PARAMETER(autoneg_wait_to_complete); > >> > >> /* If we were hit with a reset drop the link */ > >> if (!mbx->ops.check_for_rst(hw, 0) || !mbx->timeout) @@ -643,23 > >> +642,25 @@ s32 ixgbe_check_mac_link_vf(struct ixgbe_hw *hw, > >> ixgbe_link_speed *speed, > >> *speed = IXGBE_LINK_SPEED_UNKNOWN; > >> } > >> > >> - /* if the read failed it could just be a mailbox collision, best wait > >> - * until we are called again and don't report an error > >> - */ > >> - if (mbx->ops.read(hw, &in_msg, 1, 0)) > >> - goto out; > >> + if (autoneg_wait_to_complete) { > >> + /* if the read failed it could just be a mailbox collision, best > >> wait > >> + * until we are called again and don't report an error > >> + */ > >> + if (mbx->ops.read(hw, &in_msg, 1, 0)) > >> + goto out; > >> > >> - if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) { > >> - /* msg is not CTS and is NACK we must have lost CTS status > >> */ > >> - if (in_msg & IXGBE_VT_MSGTYPE_NACK) > >> + if (!(in_msg & IXGBE_VT_MSGTYPE_CTS)) { > >> + /* msg is not CTS and is NACK we must have lost CTS > >> status */ > >> + if (in_msg & IXGBE_VT_MSGTYPE_NACK) > >> + ret_val = -1; > >> + goto out; > >> + } > >> + > >> + /* the pf is talking, if we timed out in the past we reinit */ > >> + if (!mbx->timeout) { > >> ret_val = -1; > >> - goto out; > >> - } > >> - > >> - /* the pf is talking, if we timed out in the past we reinit */ > >> - if (!mbx->timeout) { > >> - ret_val = -1; > >> - goto out; > >> + goto out; > >> + } > >> } > >> > >> /* if we passed all the tests above then the link is up and we no > >> -- > >> 2.18.0 > >