On 10/16/2020 8:48 AM, Viacheslav Ovsiienko wrote:
The DPDK datapath in the transmit direction is very flexible.
An application can build the multi-segment packet and manages
almost all data aspects - the memory pools where segments
are allocated from, the segment lengths, the memory attributes
like external buffers, registered for DMA, etc.
In the receiving direction, the datapath is much less flexible,
an application can only specify the memory pool to configure the
receiving queue and nothing more. In order to extend receiving
datapath capabilities it is proposed to add the way to provide
extended information how to split the packets being received.
The new offload flag RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT in device
capabilities is introduced to present the way for PMD to report to
application about supporting Rx packet split to configurable
segments. Prior invoking the rte_eth_rx_queue_setup() routine
application should check RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag.
The following structure is introduced to specify the Rx packet
segment for RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT offload:
struct rte_eth_rxseg_split {
struct rte_mempool *mp; /* memory pools to allocate segment from */
uint16_t length; /* segment maximal data length,
configures "split point" */
uint16_t offset; /* data offset from beginning
of mbuf data buffer */
uint32_t reserved; /* reserved field */
};
The segment descriptions are added to the rte_eth_rxconf structure:
rx_seg - pointer the array of segment descriptions, each element
describes the memory pool, maximal data length, initial
data offset from the beginning of data buffer in mbuf.
This array allows to specify the different settings for
each segment in individual fashion.
rx_nseg - number of elements in the array
If the extended segment descriptions is provided with these new
fields the mp parameter of the rte_eth_rx_queue_setup must be
specified as NULL to avoid ambiguity.
There are two options to specify Rx buffer configuration:
- mp is not NULL, rx_conf.rx_seg is NULL, rx_conf.rx_nseg is zero,
it is compatible configuration, follows existing implementation,
provides single pool and no description for segment sizes
and offsets.
- mp is NULL, rx_conf.rx_seg is not NULL, rx_conf.rx_nseg is not
zero, it provides the extended configuration, individually for
each segment.
f the Rx queue is configured with new settings the packets being
received will be split into multiple segments pushed to the mbufs
with specified attributes. The PMD will split the received packets
into multiple segments according to the specification in the
description array.
For example, let's suppose we configured the Rx queue with the
following segments:
seg0 - pool0, len0=14B, off0=2
seg1 - pool1, len1=20B, off1=128B
seg2 - pool2, len2=20B, off2=0B
seg3 - pool3, len3=512B, off3=0B
The packet 46 bytes long will look like the following:
seg0 - 14B long @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
seg1 - 20B long @ 128 in mbuf from pool1
seg2 - 12B long @ 0 in mbuf from pool2
The packet 1500 bytes long will look like the following:
seg0 - 14B @ RTE_PKTMBUF_HEADROOM + 2 in mbuf from pool0
seg1 - 20B @ 128 in mbuf from pool1
seg2 - 20B @ 0 in mbuf from pool2
seg3 - 512B @ 0 in mbuf from pool3
seg4 - 512B @ 0 in mbuf from pool3
seg5 - 422B @ 0 in mbuf from pool3
The offload RTE_ETH_RX_OFFLOAD_SCATTER must be present and
configured to support new buffer split feature (if rx_nseg
is greater than one).
The split limitations imposed by underlying PMD is reported
in the new introduced rte_eth_dev_info->rx_seg_capa field.
The new approach would allow splitting the ingress packets into
multiple parts pushed to the memory with different attributes.
For example, the packet headers can be pushed to the embedded
data buffers within mbufs and the application data into
the external buffers attached to mbufs allocated from the
different memory pools. The memory attributes for the split
parts may differ either - for example the application data
may be pushed into the external memory located on the dedicated
physical device, say GPU or NVMe. This would improve the DPDK
receiving datapath flexibility with preserving compatibility
with existing API.
Signed-off-by: Viacheslav Ovsiienko <viachesl...@nvidia.com>
Acked-by: Ajit Khaparde <ajit.khapa...@broadcom.com>
Acked-by: Jerin Jacob <jer...@marvell.com>
<...>
+/**
* A structure used to configure an RX ring of an Ethernet port.
*/
struct rte_eth_rxconf {
@@ -977,6 +998,46 @@ struct rte_eth_rxconf {
uint16_t rx_free_thresh; /**< Drives the freeing of RX descriptors. */
uint8_t rx_drop_en; /**< Drop packets if no descriptors are available.
*/
uint8_t rx_deferred_start; /**< Do not start queue with
rte_eth_dev_start(). */
+ uint16_t rx_nseg; /**< Number of descriptions in rx_seg array. */
+ /**
+ * Points to the array of segment descriptions. Each array element
+ * describes the properties for each segment in the receiving
+ * buffer according to feature descripting structure.
+ *
+ * The supported capabilities of receiving segmentation is reported
+ * in rte_eth_dev_info ->rx_seg_capa field.
+ *
+ * If RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT flag is set in offloads field,
+ * the PMD will split the received packets into multiple segments
+ * according to the specification in the description array:
+ *
+ * - the first network buffer will be allocated from the memory pool,
+ * specified in the first array element, the second buffer, from the
+ * pool in the second element, and so on.
+ *
+ * - the offsets from the segment description elements specify
+ * the data offset from the buffer beginning except the first mbuf.
+ * For this one the offset is added with RTE_PKTMBUF_HEADROOM.
+ *
+ * - the lengths in the elements define the maximal data amount
+ * being received to each segment. The receiving starts with filling
+ * up the first mbuf data buffer up to specified length. If the
+ * there are data remaining (packet is longer than buffer in the first
+ * mbuf) the following data will be pushed to the next segment
+ * up to its own length, and so on.
+ *
+ * - If the length in the segment description element is zero
+ * the actual buffer size will be deduced from the appropriate
+ * memory pool properties.
+ *
+ * - if there is not enough elements to describe the buffer for entire
+ * packet of maximal length the following parameters will be used
+ * for the all remaining segments:
+ * - pool from the last valid element
+ * - the buffer size from this pool
+ * - zero offset
+ */
+ struct rte_eth_rxseg *rx_seg;
"struct rte_eth_rxconf" is very commonly used, I think all applications does the
'rte_eth_rx_queue_setup()', but "buffer split" is not a common usage,
I am against the "struct rte_eth_rxseg *rx_seg;" field creating this much noise
in the "struct rte_eth_rxconf" documentation.
As mentioned before, can you please move the above detailed documentation to
where "struct rte_eth_rxseg" defined, and in this struct put a single comment
for "struct rte_eth_rxseg *rx_seg" ?