> From: Du, Frank [mailto:frank...@intel.com]
> Sent: Friday, 24 May 2024 03.05
> 
> > From: Ferruh Yigit <ferruh.yi...@amd.com>
> > Sent: Thursday, May 23, 2024 9:32 PM
> >
> > On 5/23/2024 10:22 AM, Morten Brørup wrote:
> > >> From: Frank Du [mailto:frank...@intel.com]
> > >> Sent: Thursday, 23 May 2024 10.08
> > >>
> > >> The current calculation assumes that the mbufs are contiguous.
> > >> However, this assumption is incorrect when the mbuf memory spans across
> > huge page.
> > >> To ensure that each mbuf resides exclusively within a single page,
> > >> there are deliberate spacing gaps when allocating mbufs across the
> > boundaries.
> > >
> > > A agree that this patch is an improvement of what existed previously.
> > > But I still don't understand the patch description. To me, it looks
> > > like the patch adds a missing check for contiguous memory, and the
> > > patch itself has nothing to do with huge pages. Anyway, if the
> > > maintainer agrees with the description, I don't mind not grasping it.
> > > ;-)
> > >
> > > However, while trying to understand what is happening, I think I found one
> > more (already existing) bug.
> > > I will show through an example inline below.
> > >
> > >>
> > >> Correct to directly read the size from the mempool memory chunk.
> > >>
> > >> Fixes: d8a210774e1d ("net/af_xdp: support unaligned umem chunks")
> > >> Cc: sta...@dpdk.org
> > >>
> > >> Signed-off-by: Frank Du <frank...@intel.com>
> > >>
> > >> ---
> > >> v2:
> > >> * Add virtual contiguous detect for for multiple memhdrs
> > >> v3:
> > >> * Use RTE_ALIGN_FLOOR to get the aligned addr
> > >> * Add check on the first memhdr of memory chunks
> > >> v4:
> > >> * Replace the iterating with simple nb_mem_chunks check
> > >> ---
> > >>  drivers/net/af_xdp/rte_eth_af_xdp.c | 33
> > >> +++++++++++++++++++++++------
> > >>  1 file changed, 26 insertions(+), 7 deletions(-)
> > >>
> > >> diff --git a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> index 6ba455bb9b..d0431ec089 100644
> > >> --- a/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> +++ b/drivers/net/af_xdp/rte_eth_af_xdp.c
> > >> @@ -1040,16 +1040,32 @@ eth_link_update(struct rte_eth_dev *dev
> > >> __rte_unused,  }
> > >>
> > >>  #if defined(XDP_UMEM_UNALIGNED_CHUNK_FLAG)
> > >> -static inline uintptr_t get_base_addr(struct rte_mempool *mp,
> > >> uint64_t
> > >> *align)
> > >> +static inline uintptr_t
> > >> +get_memhdr_info(const struct rte_mempool *mp, uint64_t *align,
> > >> +size_t *len)
> > >>  {
> > >>          struct rte_mempool_memhdr *memhdr;
> > >>          uintptr_t memhdr_addr, aligned_addr;
> > >>
> > >> +        if (mp->nb_mem_chunks != 1) {
> > >> +                /*
> > >> +                 * The mempool with multiple chunks is not virtual 
> > >> contiguous
> > but
> > >> +                 * xsk umem only support single virtual region mapping.
> > >> +                 */
> > >> +                AF_XDP_LOG(ERR, "The mempool contain multiple %u memory
> > >> chunks\n",
> > >> +                                   mp->nb_mem_chunks);
> > >> +                return 0;
> > >> +        }
> > >> +
> > >> +        /* Get the mempool base addr and align from the header now */
> > >>          memhdr = STAILQ_FIRST(&mp->mem_list);
> > >> +        if (!memhdr) {
> > >> +                AF_XDP_LOG(ERR, "The mempool is not populated\n");
> > >> +                return 0;
> > >> +        }
> > >>          memhdr_addr = (uintptr_t)memhdr->addr;
> > >> -        aligned_addr = memhdr_addr & ~(getpagesize() - 1);
> > >> +        aligned_addr = RTE_ALIGN_FLOOR(memhdr_addr, getpagesize());
> > >>          *align = memhdr_addr - aligned_addr;
> > >> -
> > >> +        *len = memhdr->len;
> > >>          return aligned_addr;
> > >
> > > On x86_64, the page size is 4 KB = 0x1000.
> > >
> > > Let's look at an example where memhdr->addr is not aligned to the page
> size:
> > >
> > > In the example,
> > > memhdr->addr is 0x700100, and
> > > memhdr->len is 0x20000.
> > >
> > > Then
> > > aligned_addr becomes 0x700000,
> > > *align becomes 0x100, and
> > > *len becomes 0x20000.
> > >
> > >>  }
> > >>
> > >> @@ -1126,6 +1142,7 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >>          void *base_addr = NULL;
> > >>          struct rte_mempool *mb_pool = rxq->mb_pool;
> > >>          uint64_t umem_size, align = 0;
> > >> +        size_t len = 0;
> > >>
> > >>          if (internals->shared_umem) {
> > >>                  if (get_shared_umem(rxq, internals->if_name, &umem) < 
> > >> 0) @@
> > >> -1157,10 +1174,12 @@ xsk_umem_info *xdp_umem_configure(struct
> > >> pmd_internals *internals,
> > >>                  }
> > >>
> > >>                  umem->mb_pool = mb_pool;
> > >> -                base_addr = (void *)get_base_addr(mb_pool, &align);
> > >> -                umem_size = (uint64_t)mb_pool->populated_size *
> > >> -                                (uint64_t)usr_config.frame_size +
> > >> -                                align;
> > >> +                base_addr = (void *)get_memhdr_info(mb_pool, &align, 
> > >> &len);
> > >> +                if (!base_addr) {
> > >> +                        AF_XDP_LOG(ERR, "The memory pool can't be mapped
> > as
> > >> umem\n");
> > >> +                        goto err;
> > >> +                }
> > >> +                umem_size = (uint64_t)len + align;
> > >
> > > Here, umem_size becomes 0x20100.
> > >
> > >>
> > >>                  ret = xsk_umem__create(&umem->umem, base_addr,
> > umem_size,
> > >>                                  &rxq->fq, &rxq->cq, &usr_config);
> > >
> > > Here, xsk_umem__create() is called with the base_address (0x700000)
> > preceding the address of the memory chunk (0x700100).
> > > It looks like a bug, causing a buffer underrun. I.e. will it access memory
> starting
> > at base_address?
> > >
> >
> > I already asked for this on v2, Frank mentioned that area is not accessed
> and
> > having gap is safe.
> 
> xsk_umem__create() requires a base address that is aligned to a page boundary.
> And, there is no chance to access the area between 0x700000 and 0x700100,
> because the memory pointer for each XSK TX/RX descriptor is derived from the
> mbuf data area.

OK, thanks for explaining.

Acked-by: Morten Brørup <m...@smartsharesystems.com>

> 
> >
> > > If I'm correct, the code should probably do this for alignment instead:
> > >
> > > aligned_addr = RTE_ALIGN_CEIL(memhdr_addr, getpagesize()); *align =
> > > aligned_addr - memhdr_addr; umem_size = (uint64_t)len - align;
> > >
> > >
> > > Disclaimer: I don't know much about the AF_XDP implementation, so maybe I
> > just don't understand what is going on.
> > >
> > >> --
> > >> 2.34.1
> > >

Reply via email to