I've added this topic to the agenda for today's ARCH call. I agree
with Petri that any changes should be based on measurements, and
preferably real application measurements rather than microbenchmarks.

On Wed, May 24, 2017 at 6:46 AM, Savolainen, Petri (Nokia - FI/Espoo)
<[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: lng-odp [mailto:[email protected]] On Behalf Of
>> Sachin Saxena
>> Sent: Wednesday, May 24, 2017 12:43 PM
>> To: [email protected]
>> Subject: Re: [lng-odp] APIs for dealing with compact handles
>>
>> Thanks Bill for initiating the thread.
>>
>> Please checkout some more details*(**i**nline)* on the requirements &
>> proposal.
>>
>>
>> On 5/17/2017 8:50 PM, Bill Fischofer wrote:
>> > This thread is to discuss ideas and proposed solutions to two issues
>> > that have been raised by Sachin relating to VPP needs, as well as
>> > Honnappa relating to the scalable scheduler. Background ========= ODP
>> > handles are abstract types that implementations may define to be of
>> > arbitrary bit width in size. However, for a number of reasons (e.g.,
>> > provision of strong typing support, ABI compatibility, efficiency of
>> > internal manipulation, etc.) these are typically represented as 64-bit
>> > quantities. Some applications that store handles in their own
>> > structures wish to minimize the cache footprint consumed by these
>> > structures and so would like an option to store handles in a more
>> > compact format that uses a smaller number of bits. To date 32-bits
>> > seems sufficient for application need, however in theory 16 or even 8
>> > bits might be desirable in some circumstances. We already have an
>> > example of 8-bit handles in the odp_packet_seg_t type, where odp-linux
>> > uses an 8-bit representation of this type as a segment index when ODP
>> > is configured with --enable-abi-compat=no while using a 64-bit size
>> > when configured with --enable-abi-compat=yes. Considerations
>> > ============ In choosing the bit width to use in representing handles
>> > there are two main considerations that implementations must take into
>> > account. First, to achieve strong typing in C, handles need to be of
>> > pointer width. For development this is a very valuable feature, which
>> > is why implementations are encouraged to provide strong typing for ODP
>> > abstract types. Second, for ABI compatibility it is required that all
>> > implementations use the same width for types that are to be ABI
>> > compatible across different implementations. Implementations may
>> > interpret the bits of a handle very differently, but all must agree
>> > that handles are of the same bit width if they wish to be binary
>> > compatible with each other. Stated Needs =========== VPP currently
>> > packages its metadata into a vlib_mbuf struct that is used pervasively
>> > to reference packets that are being processed by VPP nodes. The
>> > address of this struct is desired to be held in compressed (32-bit)
>> > format. Today the vlib_mbuf is implemented as a user area associated
>> > with an odp_packet_t. As such the odp_packet_user_area() API returns a
>> > (64-bit) pointer. What is desired is a compact representation of this
>> > address.
>> VPP collects bunch of packets from ODP/DPDK input node and looks for
>> inline "struct vlib_buffer" address in each packet.
>> Then it creates a VPP Library Frame which is a collection of the
>> vlib_buffers (vectors). For this, VPP converts 64-bit address of each
>> vlib_buffer to a 32-bit index and save in the VLib frame and pass this
>> frame to next Node.
>> In each processing node in Data path where packet contents are accessed,
>> VPP converts this 32-bit index to actual 64-bit address to get packet
>> data pointer.
>> In current implementation, VPP converts 32-bit index to address @ ~900
>> places in overall code via API:
>>              vlib_get_buffer (vlib_main_t * vm, u32 buffer_index)
>>
>>
>> *Code reference*:
>> GIT: https://git.fd.io/odp4vpp/log/
>> Files:              vlib/vlib/buffer_funcs.h
>> vlib/vlib/buffer.h
>
>
> Have you considered / tested the performance impact of changing buffer_index 
> to u64. Surely, you can now pack more u32 indexes per cache line, but you 
> need to do the conversion many times (900 places hints that it's done 
> multiple times per a forwarded packet) which consumes CPU cycles also. I'd 
> like to get some numbers, how much better of you are with 32bit vs 64bit 
> handles. 64 bit handles would remove need for conversions on both levels - 
> application or ODP would not need to convert as application would store 
> odp_packet_t, which would be direct pointer to packet structure. E.g. HW data 
> prefetchers may have improved from the time when VPP was originally designed.
>
> If <64bit indexes are needed, only viable option to me is packet index 
> conversion API (no user area indexes). The down side of it is that every 
> implementation must be able to do those conversions (efficiently). Also I'd 
> say that index size would need to be 32 bits, so memory savings would be only 
> 2x in a 64 bit system.
>
> Odp-linux used to define odp_packet_t as 32bit index, but was changed to 
> pointer since it improved performance with l2fwd app about 10%. L2fwd is kind 
> of worst case app since has very few cycles per packet.
>
> Another option for VPP is to maintain a packet context table. You'd save 
> packet handle into the table and use this table index internally as 
> "buffer_index".
>
>
> -Petri
>
>

Reply via email to