Hi Matan, On Tue, Oct 31, 2017 at 11:35:29AM +0000, Matan Azrad wrote: <snip> > > -----Original Message----- > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > > Sent: Tuesday, October 31, 2017 12:17 PM > > To: Matan Azrad <ma...@mellanox.com> > > Cc: dev@dpdk.org; Ophir Munk <ophi...@mellanox.com> > > Subject: Re: [PATCH v3 6/7] net/mlx4: mitigate Tx path memory barriers > > > > Hi Matan, > > > > On Mon, Oct 30, 2017 at 07:47:20PM +0000, Matan Azrad wrote: > > > Hi Adrien > > > > > > > -----Original Message----- > > > > From: Adrien Mazarguil [mailto:adrien.mazarg...@6wind.com] > > > > Sent: Monday, October 30, 2017 4:24 PM > > > > To: Matan Azrad <ma...@mellanox.com> > > > > Cc: dev@dpdk.org; Ophir Munk <ophi...@mellanox.com> > > > > Subject: Re: [PATCH v3 6/7] net/mlx4: mitigate Tx path memory > > > > barriers > > > > > > > > On Mon, Oct 30, 2017 at 10:07:28AM +0000, Matan Azrad wrote: > > > > > Replace most of the memory barriers by compiler barriers since > > > > > they are all targeted to the DRAM; This improves code efficiency > > > > > for systems which force store order between different addresses. > > > > > > > > > > Only the doorbell record store should be protected by memory > > > > > barrier since it is targeted to the PCI memory domain. > > > > > > > > > > Limit pre byte count store compiler barrier for systems with cache > > > > > line size smaller than 64B (TXBB size). > > > > > > > > > > Signed-off-by: Matan Azrad <ma...@mellanox.com> <snip> > > > > > cq->cons_index = cons_index; > > > > > *cq->set_ci_db = rte_cpu_to_be_32(cq->cons_index & > > > > MLX4_CQ_DB_CI_MASK); > > > > > - rte_wmb(); > > > > > + rte_io_wmb(); > > > > > > > > This one could be removed entirely as well, which is more or less > > > > what the move to a compiler barrier does. Nothing in subsequent code > > > > depends on this doorbell being written, so this can piggy back on > > > > any subsequent rte_wmb(). > > > > > > Yes, you right, probably this code was taken from multi thread > > implementation. > > > > > > > > On the other hand in my opinion a barrier (compiler or otherwise) > > > > might be needed before the doorbell write, to make clear it cannot > > > > somehow be done earlier in case something attempts to optimize it > > away. > > > > > > > I think we can remove it entirely (compiler can't optimize the ci_db store > > since in depends in previous code (cons_index). > > > > Right, however you may still run into issues if the compiler determines the > > final cons_index value by looking at the loop and decides to store it before > > entering/leaving it. That's the kind of problematic optimization I was > > thinking > > of. > > > > The barrier in that sense is just to assert the order of seemingly unrelated > > load/stores. > > I think that If I left the rte_io_rmb after CQE owner check we can earn both: > 1. The concern of read ordering while reading the owner before using CQE. > 2. The ci_db concern: the compiler must read the last CQE(which is not valid > and we have no more stamp to do) before it knows the last value of > cons_index. > So we can remove entirely this rte_io_wmb in completion function. > What do you think?
That's right, this means there's a barrier before the doorbell write in any case, OK then. Just make sure cq->set_ci_db is volatile in a prior "fix" commit as described in my previous suggestion, otherwise the remaining barriers won't guarantee much. Thanks. -- Adrien Mazarguil 6WIND