Hi Ophir/Matan, On Mon, Oct 23, 2017 at 02:21:58PM +0000, Ophir Munk wrote: > From: Matan Azrad <ma...@mellanox.com> > > Remove usage of variables which doesn't add new information for > performance improvement. > > Signed-off-by: Matan Azrad <ma...@mellanox.com>
I'm almost 100% sure this commit wasn't validated for performance on its own. Don't mention "performance improvement" in that case. If you're removing a couple of local variables for readability, just say so. More below. > --- > drivers/net/mlx4/mlx4_rxtx.c | 8 ++------ > 1 file changed, 2 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c > index 014a6d3..e8d9a35 100644 > --- a/drivers/net/mlx4/mlx4_rxtx.c > +++ b/drivers/net/mlx4/mlx4_rxtx.c > @@ -285,8 +285,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n) > struct txq *txq = (struct txq *)dpdk_txq; > unsigned int elts_head = txq->elts_head; > const unsigned int elts_n = txq->elts_n; > - unsigned int elts_comp = 0; > - unsigned int bytes_sent = 0; > unsigned int i; > unsigned int max; > struct mlx4_sq *sq = &txq->msq; > @@ -498,8 +496,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n) > MLX4_BIT_WQE_OWN : 0)); > sq->head += nr_txbbs; > elt->buf = buf; > - bytes_sent += buf->pkt_len; > - ++elts_comp; > + txq->stats.obytes += buf->pkt_len; > elts_head = elts_head_next; > } > /* Take a shortcut if nothing must be sent. */ > @@ -507,13 +504,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, > uint16_t pkts_n) > return 0; > /* Increment send statistics counters. */ > txq->stats.opackets += i; > - txq->stats.obytes += bytes_sent; > /* Make sure that descriptors are written before doorbell record. */ > rte_wmb(); > /* Ring QP doorbell. */ > rte_write32(txq->msq.doorbell_qpn, txq->msq.db); > txq->elts_head = elts_head; > - txq->elts_comp += elts_comp; > + txq->elts_comp += i; > return i; > } For bytes_sent, reading these changes and assuming -O0 with the compiler attempting to convert the code without reordering/improving things, this replaces register variables used in a loop with memory operations on a large structure through a pointer (txq->stats update for every iteration instead of once at the end). Are you really sure this is more optimized that way? Although the compiler likely does it already with -O3, helping it avoid unnecessary memory writes is good in my opinion. OK for the removal of redundant elts_comp though. Although I'm pretty sure once again the compiler didn't wait for this patch to optimize it away. -- Adrien Mazarguil 6WIND