I've added this topic to the agenda for today's ARCH call. I agree with Petri that any changes should be based on measurements, and preferably real application measurements rather than microbenchmarks.
On Wed, May 24, 2017 at 6:46 AM, Savolainen, Petri (Nokia - FI/Espoo) <[email protected]> wrote: > > >> -----Original Message----- >> From: lng-odp [mailto:[email protected]] On Behalf Of >> Sachin Saxena >> Sent: Wednesday, May 24, 2017 12:43 PM >> To: [email protected] >> Subject: Re: [lng-odp] APIs for dealing with compact handles >> >> Thanks Bill for initiating the thread. >> >> Please checkout some more details*(**i**nline)* on the requirements & >> proposal. >> >> >> On 5/17/2017 8:50 PM, Bill Fischofer wrote: >> > This thread is to discuss ideas and proposed solutions to two issues >> > that have been raised by Sachin relating to VPP needs, as well as >> > Honnappa relating to the scalable scheduler. Background ========= ODP >> > handles are abstract types that implementations may define to be of >> > arbitrary bit width in size. However, for a number of reasons (e.g., >> > provision of strong typing support, ABI compatibility, efficiency of >> > internal manipulation, etc.) these are typically represented as 64-bit >> > quantities. Some applications that store handles in their own >> > structures wish to minimize the cache footprint consumed by these >> > structures and so would like an option to store handles in a more >> > compact format that uses a smaller number of bits. To date 32-bits >> > seems sufficient for application need, however in theory 16 or even 8 >> > bits might be desirable in some circumstances. We already have an >> > example of 8-bit handles in the odp_packet_seg_t type, where odp-linux >> > uses an 8-bit representation of this type as a segment index when ODP >> > is configured with --enable-abi-compat=no while using a 64-bit size >> > when configured with --enable-abi-compat=yes. Considerations >> > ============ In choosing the bit width to use in representing handles >> > there are two main considerations that implementations must take into >> > account. First, to achieve strong typing in C, handles need to be of >> > pointer width. For development this is a very valuable feature, which >> > is why implementations are encouraged to provide strong typing for ODP >> > abstract types. Second, for ABI compatibility it is required that all >> > implementations use the same width for types that are to be ABI >> > compatible across different implementations. Implementations may >> > interpret the bits of a handle very differently, but all must agree >> > that handles are of the same bit width if they wish to be binary >> > compatible with each other. Stated Needs =========== VPP currently >> > packages its metadata into a vlib_mbuf struct that is used pervasively >> > to reference packets that are being processed by VPP nodes. The >> > address of this struct is desired to be held in compressed (32-bit) >> > format. Today the vlib_mbuf is implemented as a user area associated >> > with an odp_packet_t. As such the odp_packet_user_area() API returns a >> > (64-bit) pointer. What is desired is a compact representation of this >> > address. >> VPP collects bunch of packets from ODP/DPDK input node and looks for >> inline "struct vlib_buffer" address in each packet. >> Then it creates a VPP Library Frame which is a collection of the >> vlib_buffers (vectors). For this, VPP converts 64-bit address of each >> vlib_buffer to a 32-bit index and save in the VLib frame and pass this >> frame to next Node. >> In each processing node in Data path where packet contents are accessed, >> VPP converts this 32-bit index to actual 64-bit address to get packet >> data pointer. >> In current implementation, VPP converts 32-bit index to address @ ~900 >> places in overall code via API: >> vlib_get_buffer (vlib_main_t * vm, u32 buffer_index) >> >> >> *Code reference*: >> GIT: https://git.fd.io/odp4vpp/log/ >> Files: vlib/vlib/buffer_funcs.h >> vlib/vlib/buffer.h > > > Have you considered / tested the performance impact of changing buffer_index > to u64. Surely, you can now pack more u32 indexes per cache line, but you > need to do the conversion many times (900 places hints that it's done > multiple times per a forwarded packet) which consumes CPU cycles also. I'd > like to get some numbers, how much better of you are with 32bit vs 64bit > handles. 64 bit handles would remove need for conversions on both levels - > application or ODP would not need to convert as application would store > odp_packet_t, which would be direct pointer to packet structure. E.g. HW data > prefetchers may have improved from the time when VPP was originally designed. > > If <64bit indexes are needed, only viable option to me is packet index > conversion API (no user area indexes). The down side of it is that every > implementation must be able to do those conversions (efficiently). Also I'd > say that index size would need to be 32 bits, so memory savings would be only > 2x in a 64 bit system. > > Odp-linux used to define odp_packet_t as 32bit index, but was changed to > pointer since it improved performance with l2fwd app about 10%. L2fwd is kind > of worst case app since has very few cycles per packet. > > Another option for VPP is to maintain a packet context table. You'd save > packet handle into the table and use this table index internally as > "buffer_index". > > > -Petri > >
