Thanks Bill for initiating the thread.
Please checkout some more details*(**i**nline)* on the requirements &
proposal.
On 5/17/2017 8:50 PM, Bill Fischofer wrote:
This thread is to discuss ideas and proposed solutions to two issues
that have been raised by Sachin relating to VPP needs, as well as
Honnappa relating to the scalable scheduler. Background ========= ODP
handles are abstract types that implementations may define to be of
arbitrary bit width in size. However, for a number of reasons (e.g.,
provision of strong typing support, ABI compatibility, efficiency of
internal manipulation, etc.) these are typically represented as 64-bit
quantities. Some applications that store handles in their own
structures wish to minimize the cache footprint consumed by these
structures and so would like an option to store handles in a more
compact format that uses a smaller number of bits. To date 32-bits
seems sufficient for application need, however in theory 16 or even 8
bits might be desirable in some circumstances. We already have an
example of 8-bit handles in the odp_packet_seg_t type, where odp-linux
uses an 8-bit representation of this type as a segment index when ODP
is configured with --enable-abi-compat=no while using a 64-bit size
when configured with --enable-abi-compat=yes. Considerations
============ In choosing the bit width to use in representing handles
there are two main considerations that implementations must take into
account. First, to achieve strong typing in C, handles need to be of
pointer width. For development this is a very valuable feature, which
is why implementations are encouraged to provide strong typing for ODP
abstract types. Second, for ABI compatibility it is required that all
implementations use the same width for types that are to be ABI
compatible across different implementations. Implementations may
interpret the bits of a handle very differently, but all must agree
that handles are of the same bit width if they wish to be binary
compatible with each other. Stated Needs =========== VPP currently
packages its metadata into a vlib_mbuf struct that is used pervasively
to reference packets that are being processed by VPP nodes. The
address of this struct is desired to be held in compressed (32-bit)
format. Today the vlib_mbuf is implemented as a user area associated
with an odp_packet_t. As such the odp_packet_user_area() API returns a
(64-bit) pointer. What is desired is a compact representation of this
address.
VPP collects bunch of packets from ODP/DPDK input node and looks for
inline "struct vlib_buffer" address in each packet.
Then it creates a VPP Library Frame which is a collection of the
vlib_buffers (vectors). For this, VPP converts 64-bit address of each
vlib_buffer to a 32-bit index and save in the VLib frame and pass this
frame to next Node.
In each processing node in Data path where packet contents are accessed,
VPP converts this 32-bit index to actual 64-bit address to get packet
data pointer.
In current implementation, VPP converts 32-bit index to address @ ~900
places in overall code via API:
vlib_get_buffer (vlib_main_t * vm, u32 buffer_index)
*Code reference*:
GIT: https://git.fd.io/odp4vpp/log/
Files: vlib/vlib/buffer_funcs.h
vlib/vlib/buffer.h
VPP on the transmit side also needs to obtain the odp_packet_t
associated with a vlib_mbuf. For the scalable scheduler, the desire is
for a compact representation of an odp_event_t that can be stored in a
space-efficient manner in queues. Proposed Solutions ===============
Outlined here are a couple of proposed solutions to these problems.
Please feel free to propose alternate solutions as well. For the case
of the compact user area pointers needed by VPP, the suggestion has
been made that ODP pools provide an API to return pool bounds
information so that VPP can convert the user area pointers to a more
compact index. However, this makes a number of assumptions about the
internals of ODP pools that may or may not be portable or practical in
all implementations. Since the requirement is for a compact
representation of the user area address, a more direct solution may be
simply to provide a set of new APIs that address this need directly:
uint32_t odp_packet_user_area_index(odp_packet_t pkt); This API would
return a 32-bit index of the user area associated with an
odp_packet_t. Note that since user areas are mapped one-to-one with
ODP packets, this can serve effectively as a packet index as well.
With this API, applications can obtain the user area address directly
or an indirectly in a compact form. The problem is converting the
index back into the user area address. An API of the form: void
*odp_packet_user_area_addr(uint32_t ndx); Assumes that this is a
reversible mapping, which probably isn't true. However, adding the
odp_packet_t as a second argument would be pointless since if the
application has the odp_packet_t it can use the existing
odp_packet_user_area() API directly. So the containing pool would seem
a necessary 2nd argument: void *odp_packet_user_area_addr(uint32_t
ndx, odp_pool_t pool); These APIs seem awkward as well, so perhaps
recasting these as a way to get compact packet handles might be
better: uint32_t odp_packet_to_index(odp_pool_t pool, odp_packet_t
pkt); odp_packet_t odp_packet_from_index(odp_pool_t pool, uint32_t
ndx); An interesting aside is that given this general approach, an
additional API could be envisioned that would provide even more
compact packet indexes: uint16_t odp_packet_to_index_16(odp_pool_t
pool, odp_packet_t pkt); Note that a single odp_packet_from_index()
suffices since uint16_t indexes will promote to a uint32_t argument
without problem. The odp_pool_capability() API may indicate whether
this additional compact forms is supported, and of course this would
only be possible if the pool's pkt.num is < 64K. With these APIs, the
compact vlib_mbuf requirement would seem to be satisfied by the
following routines: uint32_t vlib_mbuf_index(odp_packet_t pkt) {
return odp_packet_to_index(odp_packet_pool(pkt), pkt); } void
*vlib_mbuf_addr(odp_pool_t pool, uint32_t ndx) { return
odp_packet_user_area(odp_packet_from_index(pool, ndx)); }
The API "*v**lib_get_buffer (vlib_main_t * vm, u32 buffer_index)*" is
getting called not only in ODP/DPDK node's Transmit path
but also in VPP internal "vnet" nodes & "vlib" data path implementation.
For example, in a scenario where VPP is running as vSwitch, following
are the node's function which call this API to get buffer address from
index in RX to TX path.
1. VNET In ethernet node processing
2. l2input_node_fn in l2_input processing
3. l2flood_node_fn
4. l2output_node_fn
5. vnet_interface_output_node
6. *odp_packet_interface_tx*
That means to change implementation of "*vlib_get_buffer()*" API, to
call *odp_packet_to_index()*, we need to change the VPP internal
framework API and might need discussions to accept this.
Also, to make sure that the API "vlib_get_buffer()" compatible with
existing DPDK node, we need to introduce our code in compile time flags.
Which may looks like:
always_inline vlib_buffer_t *
vlib_get_buffer (vlib_main_t * vm, u32 buffer_index)
{
#if ODP
return odp_packet_user_area(odp_packet_from_index(pool, ndx));
#else
return vlib_physmem_at_offset (&vm->physmem_main,
((uword) buffer_index)
<< CLIB_LOG2_CACHE_LINE_BYTES);
#endif
}
Conversely, the need to obtain the odp_packet_t from the vlib_mbuf
index would be simply: odp_packet_t vlib_mbuf_to_pkt(odp_pool_t pool,
uint32_t ndx) { return odp_packet_from_index(pool, ndx); } For the
scalable scheduler, since this is internal to the ODP implementation,
there doesn't seem to be the need for any new external APIs. Internal
_odp_event_to_index() and _odp_event_from_index() APIs could be
modeled on this approach to achieve the same effect, however.