Re: [lng-odp] [PATCH v2 1/3] api: pool: Added packet pool parameters

Bill Fischofer Tue, 03 Feb 2015 15:49:15 -0800

Passing in an application-supplied shm is guesswork since there is no
architected way for an application to know how much storage an
implementation needs to honor any given request.  The way this currently
works is the call succeeds if the supplied shm is large enough and fails
otherwise.  I see no reason to change that behavior.

We originally capped the requested size to ODP_CONFIG_PACKET_BUF_LEN_MAX
but that immediately broke a number of the crypto tests which need 32K
buffers.  That's why the code currently treats requests for large sizes as
a request for an unsegmented pool.

I don't understand the reluctance to recognize unsegmented pools as a
useful construct.  All of this attempted configurability is easily solved
in a fully portable manner by saying that if an application has some reason
for wanting things to be of a specific size it can simply request an
unsegmented pool and be assured that packets will never appear segmented to
it.  There was universal agreement last Summer that this was useful so I'm
not sure what changed in the interim.

Performance may suffer on some platforms when using unsegmented pools, but
presumably that's a trade-off the application has consciously chosen.
Otherwise the application should be content to use whatever segment sizes
the implementation chooses since the relevant ODP APIs always return a
seglen to enable it to navigate them as needed, and we've already
stipulated that the implementation-chosen segment size will never be less
than 256 bytes.

A bit of diversion into HW design may explain my obstinacy here.  The
scaling problems with network processing at higher speeds all have to do
with memory processing speeds.  The rule of thumb is that in the time that
network speeds grow by a factor of 10, corresponding memory speeds grow by
a factor of 3.  This means that for every generation in the progression
from 1Gb to 10Gb to 100Gb and beyond at each step memory bottlenecks become
more and more the gating factor on the ability to handle traffic. Between
10Gb and 100Gb you have to go to NUMA-type architectures because
byte-addressable SMP DRAM simply cannot keep pace.  This has nothing to do
with how clever your SW is or how fast your processors are. It's the memory
latencies and access times that kill you.  Playing around with cache
alignments and such buys you nothing.
So what HW designers do is create segregated DRAMs designed and optimized
for packet processing.  Multiple interleaved memories, in fact, since they
need to be multiplexed to overcome the I/O-memory performance disparity.
These memories have two properties that are relevant to ODP.  First, they
are not directly addressable by processing cores.  This is why there is
always some sort of implicit or explicit mapping function needed to make
the contents of these memories addressable to cores.  Second, they are
block-addressable, not byte-addressable.  The reason for this is that
address decode lines take silicon die area and power and have a real cost
associated with them.  By eliminating byte addressability you make the
memories both faster and cheaper since address decode logic is one of the
more expensive components of any memory subsystem. Cores need byte
addressability to run SW, but HW components use whatever addressability
makes sense for them.  This is where HW packet segmentation arises.  It's
not done for arbitrary reasons and it's "baked into" the design of the
memory system and is not subject to change by SW.

So this is why the implementation choses packet segment sizes.  It's
because it's matched to the block sizes used by the HW packet memory
subsystem.  It's also why SW should process packets on a per-segment basis,
because otherwise it's incurring additional latency until all of the
segments containing a packet can be made available to it.  HW will
typically prefetch the first segment and start the core processing it as
soon as it's available and let the SW request additional segments as needed
since by design it's expected that any information the SW may need (i.e.,
the packet headers) will reside in the first segment.

I hope that makes sense.  The point is that scalable designs can scale down
to lower speeds easily while those designed for concepts that only work at
lower speeds break completely as you try to scale them up to higher
levels.  So if we design ODP for shared-memory SMP systems we need to
understand that these APIs will need to be completely reworked to enable
applications to work with high-speed devices.  So best to get them right in
the first place.

Bill

On Tue, Feb 3, 2015 at 3:04 PM, Ola Liljedahl <[email protected]>
wrote:

> My alternative approach that should achieve that same goals as Petri
> but give more freedom to implementations. You don't have to approve of
> it, I just want to show that given a defined and understood problem,
> many potential solutions exist and the first alternative might not be
> the best. Let's work together.
>
> * init_seg_len:
>         On input: user's required (desired) minimal length of initial
> segment (including headroom)
>         On output: implementation's best effort to match user's request
>         Purpose: ensure that those packet headers the application
> normally will process are stored in a consecutive memory area.
>         Applications do not have to check any configuration in order
> to initialize a configuration which the implementation anyway has to
> check if it can support.
>         Applications should check output values to see if its desired
> values were matched. The application decides whether a failure to
> match is a fatal error or the application can handle the situation
> anyway (e.g. with degraded performance because it has to do some
> workarounds in SW).
>
> * seg_len:
>         On input: user's desired length of other segments
>         On output: implementation's best effort to match user's request
>         Purpose: a hint from the user how to partition to pool into
> segments for best trade-off between memory utilization and SW
> processing performance.
>         Note: I know some HW can support multiple segment sizes so I
> think it is useful for the API to allow for this. Targets which only
> support one segment size (per packet pool) could use e.g.
> max(int_seg_len, seg_len). Some targets may not allow user-defined
> segment sizes at all, the ODP implementation will just return the
> actual values and the application can check whether those are
> acceptable.
>
> * init_seg_num:
>        On input: Number of initial segments.
>        On output: Updated with actual number of segments if a shared
> memory region was specified?
> * seg_num:
>         On input: Number of other segments.
>        On output: Updated with actual number of segments if a shared
> memory region was specified?
>
> I dislike the defines because they will make a future portable ABI
> (binary interface) definition impossible. We will need a portable ABI
> to support e.g. shared library implementations. So all ODP_CONFIG's
> should only be used internally (by the implementation in question) and
> not be accessible as compile time constants to applications, i.e. they
> should not be part of the ODP API. Are there any other instances where
> the application is supposed to use these constants?
>
> -- Ola
>
> On 3 February 2015 at 14:31, Ola Liljedahl <[email protected]>
> wrote:
> > On 3 February 2015 at 13:59, Petri Savolainen
> > <[email protected]> wrote:
> >> Completed odp_pool_param_t definition with packet pool parameters.
> >> Parameter definition is close to what we are using already.
> >>
> >> * seg_len: Defines minimum segment buffer length.
> >>            With this parameter user can:
> >>            * trade-off between pool memory usage and SW performance
> (linear memory access)
> >>            * avoid segmentation in packet head (e.g. if legacy app
> cannot handle
> >>              segmentation in the middle of the packet headers)
> > We already had defined a minimum segment size for conforming ODP
> > implementations. Isn't that enough?
> >
> > I can see value in specifying the minimum size of the first segment of
> > a packet (which would contain all headers the application is going to
> > process). But this proposal goes much further than that.
> >
> >
> >>            * seg_len < ODP_CONFIG_PACKET_SEG_LEN_MIN is rounded up to
> ODP_CONFIG_PACKET_SEG_LEN_MIN
> >>            * seg_len > ODP_CONFIG_PACKET_SEG_LEN_MAX is not valid
> >>
> >> * seg_align: Defines minimum segment buffer alignment. With this
> parameter,
> >>              user can force buffer alignment to match e.g. aligment
> requirements
> >>              of data structures stored in or algorithms accessing the
> packet
> > Can you give a practical example of when this configuration is useful?
> > To my knowledge, most data structures have quite small alignment
> > requirements, e.g. based on alignment requirements of individual
> > fields. But here I assume that we would specify alignment in multiples
> > of cache lines here (because the minimum segment alignment would be
> > the cache line size).
> >
> >>              headroom. When user don't have specific alignment
> requirement 0
> >>              should be used for default.
> >>
> >> * seg_num: Number of segments. This is also the maximum number of
> packets.
> > I think these configurations could be hints but not strict
> > requirements. They do not change the *functionality* so an application
> > should not fail if these configurations can not be obeyed (except for
> > that legacy situation you describe above). The hints enable more
> > optimal utilization of e.g. packet memory and may decrease SW overhead
> > during packet processing but do not change the functionality.
> >
> > To enable different hardware implementations, ODP apps should not
> > enforce unnecessary (non-functional) requirements on the ODP
> > implementations and limit the number of targets ODP can be implemented
> > on. ODP is not DPDK.
> >
> > Applications should also not have to first check the limits of the
> > specific ODP implementation (as you suggested yesterday), adapts its
> > configuration to that and then send back those requirements to the ODP
> > implementation (which still has to check the parameters to verify that
> > they are valid). This is too complicated and will likely lead to code
> > that cheats and thus is not portable. Better for applications just to
> > specify its requested configuration to ODP and then get back the
> > results (i.e. actual values that will be used). The application can
> > then if necessary check that the configuration was honored. This
> > follows the normal programming flow.
> >
> >>
> >> Signed-off-by: Petri Savolainen <[email protected]>
> >> ---
> >>  include/odp/api/pool.h | 26 +++++++++++++++++++++-----
> >>  1 file changed, 21 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/odp/api/pool.h b/include/odp/api/pool.h
> >> index d09d92e..a1d7494 100644
> >> --- a/include/odp/api/pool.h
> >> +++ b/include/odp/api/pool.h
> >> @@ -61,13 +61,29 @@ typedef struct odp_pool_param_t {
> >>                                              of 8. */
> >>                         uint32_t num;   /**< Number of buffers in the
> pool */
> >>                 } buf;
> >> -/* Reserved for packet and timeout specific params
> >>                 struct {
> >> -                       uint32_t seg_size;
> >> -                       uint32_t seg_align;
> >> -                       uint32_t num;
> >> +                       uint32_t seg_len;   /**< Minimum packet segment
> buffer
> >> +                                                length in bytes. It
> includes
> >> +                                                possible
> head-/tailroom bytes.
> >> +                                                The maximum value is
> defined by
> >> +
> ODP_CONFIG_PACKET_SEG_LEN_MAX.
> >> +                                                Use 0 for default
> length. */
> >> +                       uint32_t seg_align; /**< Minimum packet segment
> buffer
> >> +                                                alignment in bytes.
> Valid
> >> +                                                values are powers of
> two. The
> >> +                                                maximum value is
> defined by
> >> +
> ODP_CONFIG_PACKET_SEG_ALIGN_MAX
> >> +                                                . Use 0 for default
> alignment.
> >> +                                                Default will always be
> a
> >> +                                                multiple of 8.
> >> +                                            */
> >> +                       uint32_t seg_num;   /**< Number of packet
> segments in
> >> +                                                the pool. This is also
> the
> >> +                                                maximum number of
> packets,
> >> +                                                since each packet
> consist of
> >> +                                                at least one segment.
> > What if both seg_num and a shared memory region is specified in the
> > odp_pool_create call? Which takes precedence?
> >
> >> +                                            */
> >>                 } pkt;
> >> -*/
> >>                 struct {
> >>                         uint32_t __res1; /* Keep struct identical to
> buf, */
> >>                         uint32_t __res2; /* until pool implementation
> is fixed*/
> >> --
> >> 2.2.2
> >>
> >>
> >> _______________________________________________
> >> lng-odp mailing list
> >> [email protected]
> >> http://lists.linaro.org/mailman/listinfo/lng-odp
>
> _______________________________________________
> lng-odp mailing list
> [email protected]
> http://lists.linaro.org/mailman/listinfo/lng-odp
>

_______________________________________________
lng-odp mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/lng-odp

Re: [lng-odp] [PATCH v2 1/3] api: pool: Added packet pool parameters

Reply via email to