Hi Ophir/Matan,

On Mon, Oct 23, 2017 at 02:21:58PM +0000, Ophir Munk wrote:
> From: Matan Azrad <ma...@mellanox.com>
> 
> Remove usage of variables which doesn't add new information for
> performance improvement.
> 
> Signed-off-by: Matan Azrad <ma...@mellanox.com>

I'm almost 100% sure this commit wasn't validated for performance on its
own. Don't mention "performance improvement" in that case.

If you're removing a couple of local variables for readability, just say
so.

More below.

> ---
>  drivers/net/mlx4/mlx4_rxtx.c | 8 ++------
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/mlx4/mlx4_rxtx.c b/drivers/net/mlx4/mlx4_rxtx.c
> index 014a6d3..e8d9a35 100644
> --- a/drivers/net/mlx4/mlx4_rxtx.c
> +++ b/drivers/net/mlx4/mlx4_rxtx.c
> @@ -285,8 +285,6 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
> uint16_t pkts_n)
>       struct txq *txq = (struct txq *)dpdk_txq;
>       unsigned int elts_head = txq->elts_head;
>       const unsigned int elts_n = txq->elts_n;
> -     unsigned int elts_comp = 0;
> -     unsigned int bytes_sent = 0;
>       unsigned int i;
>       unsigned int max;
>       struct mlx4_sq *sq = &txq->msq;
> @@ -498,8 +496,7 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
> uint16_t pkts_n)
>                                                      MLX4_BIT_WQE_OWN : 0));
>               sq->head += nr_txbbs;
>               elt->buf = buf;
> -             bytes_sent += buf->pkt_len;
> -             ++elts_comp;
> +             txq->stats.obytes += buf->pkt_len;
>               elts_head = elts_head_next;
>       }
>       /* Take a shortcut if nothing must be sent. */
> @@ -507,13 +504,12 @@ mlx4_tx_burst(void *dpdk_txq, struct rte_mbuf **pkts, 
> uint16_t pkts_n)
>               return 0;
>       /* Increment send statistics counters. */
>       txq->stats.opackets += i;
> -     txq->stats.obytes += bytes_sent;
>       /* Make sure that descriptors are written before doorbell record. */
>       rte_wmb();
>       /* Ring QP doorbell. */
>       rte_write32(txq->msq.doorbell_qpn, txq->msq.db);
>       txq->elts_head = elts_head;
> -     txq->elts_comp += elts_comp;
> +     txq->elts_comp += i;
>       return i;
>  }

For bytes_sent, reading these changes and assuming -O0 with the compiler
attempting to convert the code without reordering/improving things, this
replaces register variables used in a loop with memory operations on a large
structure through a pointer (txq->stats update for every iteration instead
of once at the end).

Are you really sure this is more optimized that way? Although the compiler
likely does it already with -O3, helping it avoid unnecessary memory writes
is good in my opinion.

OK for the removal of redundant elts_comp though. Although I'm pretty sure
once again the compiler didn't wait for this patch to optimize it away.

-- 
Adrien Mazarguil
6WIND

Reply via email to