On Sun, Jul 10, 2016 at 7:05 PM, Brenden Blanco <bbla...@plumgrid.com> wrote: > On Sun, Jul 10, 2016 at 06:25:40PM +0300, Tariq Toukan wrote: >> >> On 09/07/2016 10:58 PM, Saeed Mahameed wrote: >> >On Fri, Jul 8, 2016 at 5:15 AM, Brenden Blanco <bbla...@plumgrid.com> wrote: >> >>+ /* A bpf program gets first chance to drop the packet. It >> >>may >> >>+ * read bytes but not past the end of the frag. >> >>+ */ >> >>+ if (prog) { >> >>+ struct xdp_buff xdp; >> >>+ dma_addr_t dma; >> >>+ u32 act; >> >>+ >> >>+ dma = be64_to_cpu(rx_desc->data[0].addr); >> >>+ dma_sync_single_for_cpu(priv->ddev, dma, >> >>+ >> >>priv->frag_info[0].frag_size, >> >>+ DMA_FROM_DEVICE); >> >In case of XDP_PASS we will dma_sync again in the normal path, this >> >can be improved by doing the dma_sync as soon as we can and once and >> >for all, regardless of the path the packet is going to take >> >(XDP_DROP/mlx4_en_complete_rx_desc/mlx4_en_rx_skb). >> I agree with Saeed, dma_sync is a heavy operation that is now done >> twice for all packets with XDP_PASS. >> We should try our best to avoid performance degradation in the flow >> of unfiltered packets. > Makes sense, do folks here see a way to do this cleanly?
yes, we need something like: +static inline void +mlx4_en_sync_dma(struct mlx4_en_priv *priv, + struct mlx4_en_rx_desc *rx_desc, + int length) +{ + dma_addr_t dma; + + /* Sync dma addresses from HW descriptor */ + for (nr = 0; nr < priv->num_frags; nr++) { + struct mlx4_en_frag_info *frag_info = &priv->frag_info[nr]; + + if (length <= frag_info->frag_prefix_size) + break; + + dma = be64_to_cpu(rx_desc->data[nr].addr); + dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size, + DMA_FROM_DEVICE); + } +} @@ -790,6 +808,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud goto next; } + length = be32_to_cpu(cqe->byte_cnt); + length -= ring->fcs_del; + + mlx4_en_sync_dma(priv,rx_desc, length); /* data is available continue processing the packet */ and make sure to remove all explicit dma_sync_single_for_cpu calls.