Hi Bala,
   I am trying to catch up on this conversation. I have few questions here,
not sure if they are discussed already.

1) Why is there a need for such large number of queues (Million) in a
system? If the system meets the line rate, the packets (and the flows)
waiting to be processed is small. So, a small number of queues should
suffice, if we want to have to queue per flow. I agree that the system will
receive packets belonging to million flows over a period of time.

2) Are there any thoughts on implementing a software scheduler for
linux-generic which supports large number of queues?

3) What will be the arbitration method (round robin, strict priority etc)
among the queues in the queue group? If it is weighted round robin or
strict priority, how are the priorities assigned to these queues?

4) Is the intent, to propagate the flow ID from classification down the
packet processing pipeline?

Thank you,
Honnappa


On 17 November 2016 at 13:59, Bill Fischofer <bill.fischo...@linaro.org>
wrote:

> On Thu, Nov 17, 2016 at 3:05 AM, Bala Manoharan <bala.manoha...@linaro.org
> >
> wrote:
>
> > Regards,
> > Bala
> >
> >
> > On 15 November 2016 at 22:43, Brian Brooks <brian.bro...@linaro.org>
> > wrote:
> > > On Mon, Nov 14, 2016 at 2:12 AM, Bala Manoharan
> > > <bala.manoha...@linaro.org> wrote:
> > >> Regards,
> > >> Bala
> > >>
> > >>
> > >> On 11 November 2016 at 13:26, Brian Brooks <brian.bro...@linaro.org>
> > wrote:
> > >>> On 11/10 15:17:15, Bala Manoharan wrote:
> > >>>> On 10 November 2016 at 13:26, Brian Brooks <brian.bro...@linaro.org
> >
> > wrote:
> > >>>> > On 11/07 16:46:12, Bala Manoharan wrote:
> > >>>> >> Hi,
> > >>>> >
> > >>>> > Hiya
> > >>>> >
> > >>>> >> This mail thread discusses the design of classification queue
> group
> > >>>> >> RFC. The same can be found in the google doc whose link is given
> > >>>> >> below.
> > >>>> >> Users can provide their comments either in this mail thread or in
> > the
> > >>>> >> google doc as per their convenience.
> > >>>> >>
> > >>>> >> https://docs.google.com/document/d/
> 1fOoG9WDR0lMpVjgMAsx8QsMr0YFK9
> > slR93LZ8VXqM2o/edit?usp=sharing
> > >>>> >>
> > >>>> >> The basic issues with queues as being a single target for a CoS
> > are two fold:
> > >>>> >>
> > >>>> >> Queues must be created and deleted individually. This imposes a
> > >>>> >> significant burden when queues are used to represent individual
> > flows
> > >>>> >> since the application may need to process thousands (or millions)
> > of
> > >>>> >> flows.
> > >>>> >
> > >>>> > Wondering why there is an issue with creating and deleting queues
> > individually
> > >>>> > if queue objects represent millions of flows..
> > >>>>
> > >>>> The queue groups are mainly required for hashing the incoming
> packets
> > >>>> to multiple flows based on the hash configuration.
> > >>>> So from application point of view it just needs a queue to have
> > >>>> packets belonging to same flow and that packets belonging to
> different
> > >>>> flows are placed in different queues respectively.It does not matter
> > >>>> who creates the flow/queue.
> > >>>
> > >>> When the application receives an event from odp_schedule() call, how
> > does it
> > >>> know whether the odp_queue_t was previously created by the
> application
> > from
> > >>> odp_queue_create() or whether it was created by the implementation?
> > >>
> > >> odp_schedule() call returns the queue from which the event was part
> of.
> > >> The type of the queue can be got from odp_queue_type() API.
> > >> But the question is there an use-case where the application need to
> > know?
> > >> The application has the information of the queue it has created and
> > >> the queues created by implementation are destroyed by implementation.
> > >
> > > If certain fields of the packet are hashed to a queue handle, and this
> > queue
> > > handle has not previously been created via odp_queue_create(), there
> > might
> > > be a use case where the application needs to be aware of a new "flow"..
> > > Maybe the application ages flows.
> >
>
> I'm not sure I understand the concern being raised here. Packet fields are
> matched against PMRs to get a matching CoS. That CoS, in turn, is
> associated with with a queue or a queue group. If the latter then specified
> subfields within the packet are hashed to generate an index into that queue
> group to select the individual queue within the target queue group that is
> to receive the packet. Whether these queues have been preallocated at
> odp_queue_group_create() time, or allocated dynamically on first reference
> is up to the implementation, however especially in the case of "large"
> queue groups it can be expected that the number of actual queues in use
> will be sparse so a deferred allocation strategy will most likely be used.
>
> Applications are aware of flows because that's what an individual queue
> coming out of the classifier represents. An interesting question arises is
> if a higher-level protocol (e.g., a TCP FIN sequence) ends a given flow,
> meaning that the context represented by an individual queue within a queue
> group can be released. Especially in the case of sparse queue groups it
> might be worthwhile to have an API that can communicate this flow release
> back to the classifier to facilitate queue resource management.
>
>
> >
> > It is very difficult in network traffic to predict the exact flows
> > which will be coming in an interface.
> > The application can configure for all the possible flows in that case.
> >
>
> Not sure what you mean by the application configuring. If the hash is for a
> UDP port, for example, then the queue group has 64K (logical) queues
> associated with it.  Which of these are active (and hence require
> instantiation) depends on the inbound traffic that is received, which may
> be unpredictable. But the management of this is an ODP implementation
> concern rather than an application concern, unless we extend the API with a
> flow release hint as suggested above.
>
>
> >
> > >
> > >>
> > >>>
> > >>>> It is actually simpler if implementation creates a flow since in
> that
> > >>>> case implementation need not accumulate meta-data for all possible
> > >>>> hash values in a queue group and it can be created when traffic
> > >>>> arrives in that particular flow.
> > >>>>
> > >>>> >
> > >>>> > Could an application ever call odp_schedule() and receive an event
> > (e.g. packet)
> > >>>> > from a queue (of opaque type odp_queue_t) and that queue has never
> > been created
> > >>>> > by the application (via odp_queue_create())? Could that ever
> happen
> > from the
> > >>>> > hardware, and could the application ever handle that?
> > >>>>
> > >>>> No. All the queues in the system are created by the application
> either
> > >>>> directly or in-directly.
> > >>>> In-case of queue groups the queues are in-directly created by the
> > >>>> application by configuring a queue group.
> > >>>>
> > >>>> > Or, is it related to memory usage? The reference implementation
> > >>>> > struct queue_entry_s is 320 bytes on a 64-bit machine.
> > >>>> >
> > >>>> >   2^28 ~= 268,435,456 queues -> 81.920 GB
> > >>>> >   2^26 ~=  67,108,864 queues -> 20.480 GB
> > >>>> >   2^22 ~=   4,194,304 queues ->  1.280 GB
> > >>>> >
> > >>>> > Forget about 320 bytes per queue, if each queue was represented by
> > a 32-bit
> > >>>> > integer (4 bytes!) the usage would be:
> > >>>> >
> > >>>> >   2^28 ~= 268,435,456 queues ->  1.024 GB
> > >>>> >   2^26 ~=  67,108,864 queues ->    256 MB
> > >>>> >   2^22 ~=   4,194,304 queues ->     16 MB
> > >>>> >
> > >>>> > That still might be a lot of usage if the application must
> > explicitly create
> > >>>> > every queue (before it is used) and require an ODP implementation
> > to map
> > >>>> > between every ODP queue object (opaque type) and the internal
> queue.
> > >>>> >
> > >>>> > Lets say ODP API has two classes of handles: 1) pointers, 2)
> > integers. An opaque
> > >>>> > pointer is used to point to some other software object. This
> object
> > should be
> > >>>> > larger than 64 bits (or 32 bits on a chip in 32-bit pointer mode)
> > otherwise it
> > >>>> > could just be represented in a 64-bit (or 32-bit) integer type
> > value!
> > >>>> >
> > >>>> > To support millions of queues (flows) should odp_queue_t be an
> > integer type in
> > >>>> > the API? A software-only implementation may still use 320 bytes
> per
> > queue and
> > >>>> > use that integer as an index into an array or as a key for lookup
> > operation on a
> > >>>> > data structure containing queues. An implementation with hardware
> > assist may
> > >>>> > use this integer value directly when interfacing with hardware!
> > >>>>
> > >>>> I believe I have answered this question based on explanation above.
> > >>>> Pls feel free to point out if something is not clear.
> > >>>>
> > >>>> >
> > >>>> > Would it still be necessary to assign a "name" to each queue
> (flow)?
> > >>>>
> > >>>> "name" per queue might not be required since it would mean a
> character
> > >>>> based lookup across millions of items.
> > >>>>
> > >>>> >
> > >>>> > Would a queue (flow) also require an "op type" to explicitly
> > specify whether
> > >>>> > access to the queue (flow) is threadsafe? Atomic queues are
> > threadsafe since
> > >>>> > only 1 core at any given time can recieve from it. Parallel queues
> > are also
> > >>>> > threadsafe. Are all ODP APIs threadsafe?
> > >>>>
> > >>>> There are two types of queue enqueue operation ODP_QUEUE_OP_MT and
> > >>>> ODP_QUEUE_OP_MT_UNSAFE.
> > >>>> Rest of the ODP APIs are multi thread safe since in ODP there is no
> > >>>> defined way in which a single packet can be given to more than one
> > >>>> core at the same time, as packets move across different modules
> > >>>> through queues.
> > >>>>
> > >>>> >
> > >>>> >> A single PMR can only match a packet to a single queue associated
> > with
> > >>>> >> a target CoS. This prohibits efficient capture of subfield
> > >>>> >> classification.
> > >>>> >
> > >>>> > odp_cls_pmr_create() can take more than 1 odp_pmr_param_t, so it
> is
> > possible
> > >>>> > to create a single PMR which matches multiple fields of a packet.
> I
> > can imagine
> > >>>> > a case where a packet matches pmr1 (match Vlan) and also matches
> > pmr2
> > >>>> > (match Vlan AND match L3DestIP). Is that an example of subfield
> > classification?
> > >>>> > How does the queue relate?
> > >>>>
> > >>>> This question is related to classification, If a PMR is configured
> > >>>> with more than one odp_pmr_param_t then the PMR is considered a hit
> > >>>> only if the packet matches all the configured params.
> > >>>>
> > >>>> Consider the following,
> > >>>>
> > >>>> pktio1 (Default_CoS) ==== PMR1 ====> CoS1 ====PMR2 ====> CoS2.
> > >>>>
> > >>>> 1) Any packet arriving in pktio1 will be assigned to Default_CoS and
> > >>>> will be first applied with PMR1
> > >>>> 2) If the packet matches PMR1 it will be delivered to CoS1
> > >>>> 3) If the packet does not match PMR1 then it will remain in
> > Default_CoS.
> > >>>> 4) Any packets arriving in CoS1 will be applied with PMR2, If the
> > >>>> packet matches PMR2 then it will be delivered to CoS2.
> > >>>> 5). If the packet does not match PMR2 it will remain in CoS1.
> > >>>>
> > >>>>
> > >>>> Each CoS will be configured with queue groups.
> > >>>> Based on the final CoS of the packet the hash configuration (RSS) of
> > >>>> the queue group will be applied to the packet and the packet will be
> > >>>> spread across the queues within the queue group.
> > >>>
> > >>> Got it. So Classification PMR CoS happens entirely before Queue
> Groups.
> > >>> And with Queue Groups it allows a single PMR to match a packet and
> > assign
> > >>> that packet to 1 out of Many queues instead of just 1 queue only.
> > >>>
> > >>>> Hope this clarifies.
> > >>>> Bala
> > >>>>
> > >>>> >
> > >>>> >> To solve these issues, Tiger Moth introduces the concept of a
> queue
> > >>>> >> group. A queue group is an extension to the existing queue
> > >>>> >> specification in a Class of Service.
> > >>>> >>
> > >>>> >> Queue groups solve the classification issues associated with
> > >>>> >> individual queues in three ways:
> > >>>> >>
> > >>>> >> * The odp_queue_group_create() API can create a large number of
> > >>>> >> related queues with a single call.
> > >>>> >
> > >>>> > If the application calls this API, does that mean the ODP
> > implementation
> > >>>> > can create a large number of queues? What happens if the
> application
> > >>>> > receives an event on a queue that was created by the
> > implmentation--how
> > >>>> > does the application know whether this queue was created by the
> > hardware
> > >>>> > according to the ODP Classification or whether the queue was
> > created by
> > >>>> > the application?
> > >>>> >
> > >>>> >> * A single PMR can spread traffic to many queues associated with
> > the
> > >>>> >> same CoS by assigning packets matching the PMR to a queue group
> > rather
> > >>>> >> than a queue.
> > >>>> >> * A hashed PMR subfield is used to distribute individual queues
> > within
> > >>>> >> a queue group for scheduling purposes.
> > >>>> >
> > >>>> > Is there a way to write a test case for this? Trying to think of
> > what kind of
> > >>>> > packets (traffic distribution) and how those packets would get
> > classified and
> > >>>> > get assigned to queues.
> > >>>> >
> > >>>> >> diff --git a/include/odp/api/spec/classification.h
> > >>>> >> b/include/odp/api/spec/classification.h
> > >>>> >> index 6eca9ab..cf56852 100644
> > >>>> >> --- a/include/odp/api/spec/classification.h
> > >>>> >> +++ b/include/odp/api/spec/classification.h
> > >>>> >> @@ -126,6 +126,12 @@ typedef struct odp_cls_capability_t {
> > >>>> >>
> > >>>> >> /** A Boolean to denote support of PMR range */
> > >>>> >> odp_bool_t pmr_range_supported;
> > >>>> >> +
> > >>>> >> + /** A Boolean to denote support of queue group */
> > >>>> >> + odp_bool_t queue_group_supported;
> > >>>> >> +
> > >>>> >> + /** A Boolean to denote support of queue */
> > >>>> >> + odp_bool_t queue_supported;
> > >>>> >> } odp_cls_capability_t;
> > >>>> >>
> > >>>> >>
> > >>>> >> /**
> > >>>> >> @@ -162,7 +168,18 @@ typedef enum {
> > >>>> >>  * Used to communicate class of service creation options
> > >>>> >>  */
> > >>>> >> typedef struct odp_cls_cos_param {
> > >>>> >> - odp_queue_t queue; /**< Queue associated with CoS */
> > >>>> >> + /** If type is ODP_QUEUE_T, odp_queue_t is linked with CoS,
> > >>>> >> + * if type is ODP_QUEUE_GROUP_T, odp_queue_group_t is linked
> with
> > CoS.
> > >>>> >> + */
> > >>>> >> + odp_queue_type_e type;
> > >>>> >> +
> > >>>> >> + typedef union {
> > >>>> >> + /** Queue associated with CoS */
> > >>>> >> + odp_queue_t queue;
> > >>>> >> +
> > >>>> >> + /** Queue Group associated with CoS */
> > >>>> >> + odp_queue_group_t queue_group;
> > >>>> >> + };
> > >>>> >> odp_pool_t pool; /**< Pool associated with CoS */
> > >>>> >> odp_cls_drop_t drop_policy; /**< Drop policy associated with CoS
> */
> > >>>> >> } odp_cls_cos_param_t;
> > >>>> >>
> > >>>> >>
> > >>>> >> diff --git a/include/odp/api/spec/queue.h
> > b/include/odp/api/spec/queue.h
> > >>>> >> index 51d94a2..7dde060 100644
> > >>>> >> --- a/include/odp/api/spec/queue.h
> > >>>> >> +++ b/include/odp/api/spec/queue.h
> > >>>> >> @@ -158,6 +158,87 @@ typedef struct odp_queue_param_t {
> > >>>> >> odp_queue_t odp_queue_create(const char *name, const
> > odp_queue_param_t *param);
> > >>>> >>
> > >>>> >> +/**
> > >>>> >> + * Queue group capability
> > >>>> >> + * This capability structure defines system Queue Group
> capability
> > >>>> >> + */
> > >>>> >> +typedef struct odp_queue_group_capability_t {
> > >>>> >> + /** Number of queues supported per queue group */
> > >>>> >> + unsigned supported_queues;
> > >>>> >> + /** Supported protocol fields for hashing*/
> > >>>> >> + odp_pktin_hash_proto_t supported;
> > >>>> >> +}
> > >>>> >> +
> > >>>> >> +/**
> > >>>> >> + * ODP Queue Group parameters
> > >>>> >> + * Queue group supports only schedule queues <TBD??>
> > >>>> >> + */
> > >>>> >> +typedef struct odp_queue_group_param_t {
> > >>>> >> + /** Number of queue to be created for this queue group
> > >>>> >> + * implementation may round up the value to nearest power of 2
> > >>>
> > >>> Wondering what this means for obtaining the max number of queues
> > >>> supported by the system via odp_queue_capability()..
> > >>>
> > >>> powers of 2..
> > >>>
> > >>> If the platform supports 2^16 (65,536) queues, odp_queue_capability()
> > >>> max_queues should report 65,536 queues, right?
> > >>>
> > >>> If an odp_queue_group_t is created requesting 2^4 (16) queues, should
> > >>> odp_queue_capability() now return (65,536 - 16) 65520 queues or
> > >>> (2^12) 4096 queues?
> > >>
> > >> odp_queue_capability() is called before creating the creating the
> queue
> > group,
> > >> so if an implementation has the limitation that it can only support
> > >> 2^16 queues then application
> > >> has to configure only 2^16 queues in the queue group.
> > >
> > > In this use case, wouldn't all queues then be reserved for creation
> > > by the implementation? And, now odp_queue_create() will always return
> > > the null handle?
> > >
> > > What happens if you do:
> > > 1. odp_queue_capability() -> 2^16 queues
> > > 2. odp_queue_group_create( 2^4 queues )
> > > 3. odp_queue_capability() -> ???
> >
> > This is a limit on the number of queue supported by a queue group.
> > This does not reflect the number of queues created using
> > odp_queue_create() function.
> > The implementation updates the maximum number of queues it can support
> > within a queue group, the application is free to configure any number
> > less than the maximum supported.
> >
>
> Capabilities in ODP are used to specify implementation limits, not current
> allocations. For example, in odp-linux there is currently a limit of 64
> pools that can be created. It doesn't matter how many are currently created
> as that is simply the system limit on odp_pool_create(). The same would
> apply for queues and queue groups. An implementation may be limited to N
> queue groups that can contain a maximum of K queues each. Separately the
> implementation might have a limit of X total queues it can support. How
> these are divided among individual queues or queues that are members of
> queue groups should not affect these capability limits, which are static.
>
> When an allocation request is made and an internal limit is exceeded the
> allocation request simply fails. The capabilities are there to guide the
> application in its allocation requests so that such "surprises" are rare.
>
>
> >
> > >
> > >>>
> > >>> Could there be a dramatic effect on the total number of queues when
> > >>> many odp_queue_group_t have been created? E.g. 4 odp_queue_group_t
> > >>> created requesting 2^4 (16) queues -> 2^4, 2^4, 2^4, 2^4. All 16 bits
> > >>> used and effective number of queues is (16+16+16+16) 64 queues.
> > >>>
> > >>> Is it be possible to flexibly utilize all 2^16 queues the platform
> > >>> supports regardless of whether the queue was created by the
> > implementation
> > >>> or explicitly created by the application?
> > >>
> > >> This limitation is per queue group and there can be a limitation of
> > >> total queue group in the system.
> > >> Usually the total queue group supported would be a limited number.
> > >>
> > >>>
> > >>> If so, is there a way to store this extra bit of information--whether
> > >>> a queue was created by the implementation or the application?
> > >>> One of the 16 bits might work.
> > >>> But, this reduces the number of queues to (2^15) 32768.
> > >>> ..at least they are fully utilizable by both implementation and
> > application.
> > >>
> > >> There are different queue types and we can ODP_QUEUE_GROUP_T as a new
> > >> type to differentiate
> > >> a queue created by odp_queue_create() and using queue group create
> > function.
> > >>
> > >> During destroying of the resources, the application destroys the
> > >> queues created by application and
> > >> implementation destroys the queues within a queue group when
> > >> application destroys queue group.
> > >>
> > >>>
> > >>> When the application receives an odp_event_t from odp_queue_t after
> > >>> a call to odp_schedule(), could the application call..
> > >>> odp_queue_domain() to check whether this odp_queue_t was created by
> > >>> the implementation or the application? Function returns that bit.
> > >>
> > >> Since we have a queue type which can be got using the function
> > >> odp_queue_type_t I think this
> > >> odp_queue_domain() API is not needed.
> > >
> > > Petri pointed out this week that a packet_io's destination queue may
> also
> > > be another case that could use a queue_group.
> > > Perhaps what is needed is a way to connect blocks (ODP objects)
> > > together like legos using something other than an odp_queue_t --
> > > because what flows through these blocks are events from one
> > > (and now more!) odp_queue_t.  Whether the queue was created by
> > > the implementation or application is a separate concern.
> >
>
> One possible extension area similar to this would be link bonding where
> multiple pktios are bonded together for increased throughput and/or
> failover (depending on whether the bond is active/active or
> active/standby). We alluded to this in a recent ARCH call where TM talks to
> a single PktIO however that PktIO might represent multiple links in this
> case. A more generalized "group" concept might be an easy way to achieve
> that here.
>
>
> > >
> > >>>
> > >>> If the queue domain is implementation, could it be an event
> > >>> (newly arrived packet) that came through Classification PMR CoS
> (CPC)?
> > >>> The packet is assigned to a odp_queue_t (flow) (created by the
> > implementation)
> > >>> as defined by the CPC that was setup by the application.
> > >>> Might want efficient access to packet metadata which was populated
> > >>> as an effect of the packet passing through CPC stage.
> > >>>
> > >>> If the queue domain is application, could it be an event
> > >>> (crypto compl, or any synchronization point against ipblock or
> > >>> device over PCI bus that indicates some assist/acceleration work
> > >>> has finished) comes from a odp_queue_t previously created by the
> > >>> application via a call to odp_queue_create() (which sets that bit)?
> > >>> This queue would be any queue (not necessarily a packet 'flow')
> > >>> created by the data plane software (application).
> > >>
> > >> We already have a queue type and event type which differentiates the
> > >> events as BUFFER, PACKET, TIMEOUT, CRYPTO_COMPL. Also the packet flow
> > >> queues can be
> > >> created only using HW since it is mainly useful for spreading the
> > >> packets across multiple flows.
> > >>
> > >> -Bala
> > >>>
> > >>>> >> + * and value should be less than the number of queues
> > >>>> >> + * supported per queue group
> > >>>> >> + */
> > >>>> >> + unsigned num_queue;
> > >>>> >> +
> > >>>> >> + /** Protocol field selection for queue group distribution
> > >>>> >> + * Multiple fields can be selected in combination
> > >>>> >> + */
> > >>>> >> + odp_queue_group_hash_proto_t hash;
> > >>>> >> +
> > >>>> >> +} odp_queue_group_param_t;
> > >>>> >> +
> > >>>> >> +/**
> > >>>> >> + * Initialize queue group params
> > >>>> >> + *
> > >>>> >> + * Initialize an odp_queue_group_param_t to its default values
> > for all fields.
> > >>>> >> + *
> > >>>> >> + * @param param   Address of the odp_queue_group_param_t to be
> > initialized
> > >>>> >> + */
> > >>>> >> +void odp_queue_group_param_init(odp_queue_group_param_t
> *param);
> > >>>> >> +
> > >>>> >> +/**
> > >>>> >> + * Queue Group create
> > >>>> >> + *
> > >>>> >> + * Create a queue group according to the queue group parameters.
> > >>>> >> + * The individual queues belonging to a queue group are created
> > by the
> > >>>> >> + * implementation and the distribution of packets into those
> > queues are
> > >>>> >> + * decided based on the odp_queue_group_hash_proto_t parameters.
> > >>>> >> + * The individual queues within a queue group are both created
> > and deleted
> > >>>> >> + * by the implementation.
> > >>>> >> + *
> > >>>> >> + * @param name    Queue Group name
> > >>>> >> + * @param param   Queue Group parameters.
> > >>>> >> + *
> > >>>> >> + * @return Queue group handle
> > >>>> >> + * @retval ODP_QUEUE_GROUP_INVALID on failure
> > >>>> >> + */
> > >>>> >> +odp_queue_group_t odp_queue_group_create(const char *name,
> > >>>> >> + const odp_queue_group_param_t *param);
> > >>>> >> Regards,
> > >>>> >> Bala
> >
>

Reply via email to