> -----Original Message----- > From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > Sent: Thursday, July 30, 2015 10:10 AM > To: Ananyev, Konstantin; Zhang, Helin; Martin Weiser > Cc: dev at dpdk.org > Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > mbuf private area size is odd > > Hi Konstantin, > > On 07/30/2015 11:00 AM, Ananyev, Konstantin wrote: > > Hi Olivier, > > > >> -----Original Message----- > >> From: Olivier MATZ [mailto:olivier.matz at 6wind.com] > >> Sent: Thursday, July 30, 2015 9:12 AM > >> To: Zhang, Helin; Ananyev, Konstantin; Martin Weiser > >> Cc: dev at dpdk.org > >> Subject: Re: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > >> mbuf private area size is odd > >> > >> Hi, > >> > >> On 07/29/2015 10:24 PM, Zhang, Helin wrote: > >>> Hi Martin > >>> > >>> Thank you very much for the good catch! > >>> > >>> The similar situation in i40e, as explained by Konstantin. > >>> As header split hasn't been supported by DPDK till now. It would be > >>> better to put the header address in RX descriptor to 0. > >>> But in the future, during header split enabling. We may need to pay extra > >>> attention to that. As at least x710 datasheet said > >> specifically as below. > >>> "The header address should be set by the software to an even number (word > >>> aligned address)". We may need to find a way to > >> ensure that during mempool/mbuf allocation. > >> > >> Indeed it would be good to force the priv_size to be aligned. > >> > >> The priv_size could be aligned automatically in > >> rte_pktmbuf_pool_create(). The only possible problem I could see > >> is that it would break applications that access to the data buffer > >> by doing (sizeof(mbuf) + sizeof(priv)), which is probably not the > >> best thing to do (I didn't find any applications like this in dpdk). > > > > > > Might be just make rte_pktmbuf_pool_create() fail if input priv_size % > > MIN_ALIGN != 0? > > Hmm maybe it would break more applications: an odd priv_size is > probably rare, but a priv_size that is not aligned to 8 bytes is > maybe more common.
My thought was that rte_mempool_create() was just introduced in 2.1, so if we add extra requirement for the input parameter now - there would be no ABI breakage, and not many people started to use it already. For me just seems a bit easier and more straightforward then silent alignment - user would not have wrong assumptions here. Though if you think that a silent alignment would be more convenient for most users - I wouldn't insist. Konstantin > It's maybe safer to align the size transparently? > > > Regards, > Olivier > > > > > > >> > >> For applications that directly use rte_mempool_create() instead of > >> rte_pktmbuf_pool_create(), we could add a check using an assert in > >> rte_pktmbuf_init() and some words in the documentation. > >> > >> The question is: what should be the proper alignment? I would say > >> at least 8 bytes, but maybe cache_aligned is an option too. > > > > 8 bytes seems enough to me. > > > > Konstantin > > > >> > >> Regards, > >> Olivier > >> > >> > >>> > >>> Regards, > >>> Helin > >>> > >>>> -----Original Message----- > >>>> From: Ananyev, Konstantin > >>>> Sent: Wednesday, July 29, 2015 11:12 AM > >>>> To: Martin Weiser; Zhang, Helin; olivier.matz at 6wind.com > >>>> Cc: dev at dpdk.org > >>>> Subject: RE: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e > >>>> when mbuf > >>>> private area size is odd > >>>> > >>>> Hi Martin, > >>>> > >>>>> -----Original Message----- > >>>>> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Martin Weiser > >>>>> Sent: Wednesday, July 29, 2015 4:07 PM > >>>>> To: Zhang, Helin; olivier.matz at 6wind.com > >>>>> Cc: dev at dpdk.org > >>>>> Subject: [dpdk-dev] Issue with non-scattered rx in ixgbe and i40e when > >>>>> mbuf private area size is odd > >>>>> > >>>>> Hi Helin, Hi Olivier, > >>>>> > >>>>> we are seeing an issue with the ixgbe and i40e drivers which we could > >>>>> track down to our setting of the private area size of the mbufs. > >>>>> The issue can be easily reproduced with the l2fwd example application > >>>>> when a small modification is done: just set the priv_size parameter in > >>>>> the call to the rte_pktmbuf_pool_create function to an odd number like > >>>>> 1. In our tests this causes every call to rte_eth_rx_burst to return > >>>>> 32 (which is the setting of nb_pkts) nonsense mbufs although no > >>>>> packets are received on the interface and the hardware counters do not > >>>>> report any received packets. > >>>> > >>>> From Niantic datasheet: > >>>> > >>>> "7.1.6.1 Advanced Receive Descriptors ? Read Format Table 7-15 lists the > >>>> advanced receive descriptor programming by the software. The ... > >>>> Packet Buffer Address (64) > >>>> This is the physical address of the packet buffer. The lowest bit is A0 > >>>> (LSB of the > >>>> address). > >>>> Header Buffer Address (64) > >>>> The physical address of the header buffer with the lowest bit being > >>>> Descriptor > >>>> Done (DD). > >>>> When a packet spans in multiple descriptors, the header buffer address > >>>> is used > >>>> only on the first descriptor. During the programming phase, software > >>>> must set > >>>> the DD bit to zero (see the description of the DD bit in this section). > >>>> This means > >>>> that header buffer addresses are always word aligned." > >>>> > >>>> Right now, in ixgbe PMD we always setup Packet Buffer Address (PBA)and > >>>> Header Buffer Address (HBA) to the same value: > >>>> buf_physaddr + RTE_PKTMBUF_HEADROOM > >>>> So when pirv_size==1, DD bit in RXD is always set to one by SW itself, > >>>> and then > >>>> SW considers that HW already done with it. > >>>> In other words, right now for ixgbe you can't use RX buffer that is not > >>>> aligned on > >>>> word boundary. > >>>> > >>>> So the advice would be, right now - don't set priv_size to the odd value. > >>>> As we don't support split header feature anyway, I think we can fix it > >>>> just by > >>>> always setting HBA in the RXD to zero. > >>>> Could you try the fix for ixgbe below? > >>>> > >>>> Same story with FVL, I believe. > >>>> Konstantin > >>>> > >>>> > >>>>> Interestingly this does not happen if we force the scattered rx path. > >>>>> > >>>>> I assume the drivers have some expectations regarding the alignment of > >>>>> the buf_addr in the mbuf and setting an odd private are size breaks > >>>>> this alignment in the rte_pktmbuf_init function. If this is the case > >>>>> then one possible fix might be to enforce an alignment on the private > >>>>> area size. > >>>>> > >>>>> Best regards, > >>>>> Martin > >>>> > >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> index a0c8847..94967c5 100644 > >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, > >>>> bool > >>>> reset_mbuf) > >>>> > >>>> /* populate the descriptors */ > >>>> dma_addr = > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); > >>>> - rxdp[i].read.hdr_addr = dma_addr; > >>>> + rxdp[i].read.hdr_addr = 0; > >>>> rxdp[i].read.pkt_addr = dma_addr; > >>>> } > >>>> > >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > >>>> **rx_pkts, > >>>> rxe->mbuf = nmb; > >>>> dma_addr = > >>>> > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); > >>>> - rxdp->read.hdr_addr = dma_addr; > >>>> + rxdp->read.hdr_addr = 0; > >>>> rxdp->read.pkt_addr = dma_addr; > >>>> > >>>> /* > >>>> @@ -1741,7 +1741,7 @@ next_desc: > >>>> rxe->mbuf = nmb; > >>>> > >>>> rxm->data_off = RTE_PKTMBUF_HEADROOM; > >>>> - rxdp->read.hdr_addr = dma; > >>>> + rxdp->read.hdr_addr = 0; > >>>> rxdp->read.pkt_addr = dma; > >>>> } else > >>>> rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@ > >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) > >>>> dma_addr = > >>>> > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); > >>>> rxd = &rxq->rx_ring[i]; > >>>> - rxd->read.hdr_addr = dma_addr; > >>>> + rxd->read.hdr_addr = 0; > >>>> rxd->read.pkt_addr = dma_addr; > >>>> rxe[i].mbuf = mbuf; > >>>> } > >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> index 6c1647e..16a9c64 100644 > >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) > >>>> RTE_PKTMBUF_HEADROOM); > >>>> __m128i dma_addr0, dma_addr1; > >>>> > >>>> + const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX); > >>>> + > >>>> rxdp = rxq->rx_ring + rxq->rxrearm_start; > >>>> > >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 > >>>> +110,9 @@ > >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) > >>>> dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); > >>>> dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); > >>>> > >>>> + dma_addr0 = _mm_and_si128(dma_addr0, hba_msk); > >>>> + dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); > >>>> + > >>>> /* flush desc with pa dma_addr */ > >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); > >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); > >>>> bash-4.2$ cat patch1 diff --git a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> b/drivers/net/ixgbe/ixgbe_rxtx.c index a0c8847..94967c5 100644 > >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx.c > >>>> @@ -1183,7 +1183,7 @@ ixgbe_rx_alloc_bufs(struct ixgbe_rx_queue *rxq, > >>>> bool > >>>> reset_mbuf) > >>>> > >>>> /* populate the descriptors */ > >>>> dma_addr = > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mb)); > >>>> - rxdp[i].read.hdr_addr = dma_addr; > >>>> + rxdp[i].read.hdr_addr = 0; > >>>> rxdp[i].read.pkt_addr = dma_addr; > >>>> } > >>>> > >>>> @@ -1414,7 +1414,7 @@ ixgbe_recv_pkts(void *rx_queue, struct rte_mbuf > >>>> **rx_pkts, > >>>> rxe->mbuf = nmb; > >>>> dma_addr = > >>>> > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(nmb)); > >>>> - rxdp->read.hdr_addr = dma_addr; > >>>> + rxdp->read.hdr_addr = 0; > >>>> rxdp->read.pkt_addr = dma_addr; > >>>> > >>>> /* > >>>> @@ -1741,7 +1741,7 @@ next_desc: > >>>> rxe->mbuf = nmb; > >>>> > >>>> rxm->data_off = RTE_PKTMBUF_HEADROOM; > >>>> - rxdp->read.hdr_addr = dma; > >>>> + rxdp->read.hdr_addr = 0; > >>>> rxdp->read.pkt_addr = dma; > >>>> } else > >>>> rxe->mbuf = NULL; @@ -3633,7 +3633,7 @@ > >>>> ixgbe_alloc_rx_queue_mbufs(struct ixgbe_rx_queue *rxq) > >>>> dma_addr = > >>>> > >>>> rte_cpu_to_le_64(RTE_MBUF_DATA_DMA_ADDR_DEFAULT(mbuf)); > >>>> rxd = &rxq->rx_ring[i]; > >>>> - rxd->read.hdr_addr = dma_addr; > >>>> + rxd->read.hdr_addr = 0; > >>>> rxd->read.pkt_addr = dma_addr; > >>>> rxe[i].mbuf = mbuf; > >>>> } > >>>> diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> index 6c1647e..16a9c64 100644 > >>>> --- a/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec.c > >>>> @@ -56,6 +56,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) > >>>> RTE_PKTMBUF_HEADROOM); > >>>> __m128i dma_addr0, dma_addr1; > >>>> > >>>> + const __m128i hba_msk = _mm_set_epi64x(0, UINT64_MAX); > >>>> + > >>>> rxdp = rxq->rx_ring + rxq->rxrearm_start; > >>>> > >>>> /* Pull 'n' more MBUFs into the software ring */ @@ -108,6 > >>>> +110,9 @@ > >>>> ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) > >>>> dma_addr0 = _mm_add_epi64(dma_addr0, hdr_room); > >>>> dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); > >>>> > >>>> + dma_addr0 = _mm_and_si128(dma_addr0, hba_msk); > >>>> + dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); > >>>> + > >>>> /* flush desc with pa dma_addr */ > >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); > >>>> _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); > >