sched: Introduce the TBS Qdisc

Jesus Sanchez-Palencia Mon, 09 Apr 2018 09:40:25 -0700

Hi Thomas,

On 03/28/2018 12:48 AM, Thomas Gleixner wrote:

(...)

>
> There are two modes:
>
>       1) Send at the given TX time (Explicit mode)
>
>       2) Send before given TX time (Deadline mode)
>
> There is no need to specify 'drop if late' simply because if the message is
> handed in past the given TX time, it's too late by definition. What you are
> trying to implement is a hybrid of TSN and general purpose (not time aware)
> networking in one go. And you do that because your overall design is not
> looking at the big picture. You designed from a given use case assumption
> and tried to fit other things into it with duct tape.

Ok, I see the difference now, thanks. I have just two more questions about the
deadline mode, please see below.

(...)

>
>>> Coming back to the overall scheme. If you start upfront with a time slice
>>> manager which is designed to:
>>>
>>>   - Handle multiple channels
>>>
>>>   - Expose the time constraints, properties per channel
>>>
>>> then you can fit all kind of use cases, whether designed by committee or
>>> not. You can configure that thing per node or network wide. It does not
>>> make a difference. The only difference are the resulting constraints.
>>
>>
>> Ok, and I believe the above was covered by what we had proposed before, 
>> unless
>> what you meant by time constraints is beyond the configured port schedule.
>>
>> Are you suggesting that we'll need to have a kernel entity that is not only
>> aware of the current traffic classes 'schedule', but also of the resources 
>> that
>> are still available for new streams to be accommodated into the classes? 
>> Putting
>> it differently, is the TAS you envision just an entity that runs a schedule, 
>> or
>> is it a time-aware 'orchestrator'?
>
> In the first place its something which runs a defined schedule.
>
> The accomodation for new streams is required, but not necessarily at the
> root qdisc level. That might be a qdisc feeding into it.
>
> Assume you have a bandwidth reservation, aka time slot, for audio. If your
> audio related qdisc does deadline scheduling then you can add new streams
> to it up to the point where it's not longer able to fit.
>
> The only thing which might be needed at the root qdisc is the ability to
> utilize unused time slots for other purposes, but that's not required to be
> there in the first place as long as its designed in a way that it can be
> added later on.

Ok, agreed.

>
>>> So lets look once more at the picture in an abstract way:
>>>
>>>                    [ NIC ]
>>>               |
>>>      [ Time slice manager ]
>>>         |           |
>>>          [ Ch 0 ] ... [ Ch N ]
>>>
>>> So you have a bunch of properties here:
>>>
>>> 1) Number of Channels ranging from 1 to N
>>>
>>> 2) Start point, slice period and slice length per channel
>>
>> Ok, so we agree that a TAS entity is needed. Assuming that channels are 
>> traffic
>> classes, do you have something else in mind other than a new root qdisc?
>
> Whatever you call it, the important point is that it is the gate keeper to
> the network adapter and there is no way around it. It fully controls the
> timed schedule how simple or how complex it may be.

Ok, and I've finally understood the nuance between the above and what we had
planned initially.

(...)

>>
>> * TAS:
>>
>>    The idea we are currently exploring is to add a "time-aware", priority 
>> based
>>    qdisc, that also exposes the Tx queues available and provides a mechanism 
>> for
>>    mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>>    mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would 
>> be:
>>
>>    $ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4    \
>>                 map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3                         \
>>         queues 0 1 2 3                                              \
>>                 sched-file gates.sched [base-time <interval>]               \
>>            [cycle-time <interval>] [extension-time <interval>]
>>
>>    <file> is multi-line, with each line being of the following format:
>>    <cmd> <gate mask> <interval in nanoseconds>
>>
>>    Qbv only defines one <cmd>: "S" for 'SetGates'
>>
>>    For example:
>>
>>    S 0x01 300
>>    S 0x03 500
>>
>>    This means that there are two intervals, the first will have the gate
>>    for traffic class 0 open for 300 nanoseconds, the second will have
>>    both traffic classes open for 500 nanoseconds.
>
> To accomodate stuff like control systems you also need a base line, which
> is not expressed as interval. Otherwise you can't schedule network wide
> explicit plans. That's either an absolute network-time (TAI) time stamp or
> an offset to a well defined network-time (TAI) time stamp, e.g. start of
> epoch or something else which is agreed on. The actual schedule then fast
> forwards past now (TAI) and sets up the slots from there. That makes node
> hotplug possible as well.

Sure, and the [base-time <interval>] on the command line above was actually
wrong. It should have been expressed as [base-time <timestamp>].

>> It would handle multiple channels and expose their constraints / properties.
>> Each channel also becomes a traffic class, so other qdiscs can be attached to
>> them separately.
>
> Right.
>
>> So, in summary, because our entire design is based on qdisc interfaces, what 
>> we
>> had proposed was a root qdisc (the time slice manager, as you put) that 
>> allows
>> for other qdiscs to be attached to each channel. The inner qdiscs define the
>> queueing modes for each channel, and tbs is just one of those modes. I
>> understand now that you want to allow for fully dynamic use-cases to be
>> supported as well, which we hadn't covered with our TAS proposal before 
>> because
>> we hadn't envisioned it being used for these systems' design.
>
> Yes, you have the root qdisc, which is in charge of the overall scheduling
> plan, how complex or not it is defined does not matter. It exposes traffic
> classes which have properties defined by the configuration.

Perfect. Let's see if we can agree on an overall plan, then. Hopefully I'm not
missing anything.

For the above we'll develop a new qdisc, designed along the 'taprio' ideas, thus
a Qbv style scheduler, to be used as root qdisc. It can run the schedule inside
the kernel or just offload it to the NIC if supported. Similarly to the other
multiqueue qdiscs, it will expose the HW Tx queues.

What is new here from the ideas we shared last year is that this new root qdisc
will be responsible for calling the attached qdiscs' dequeue functions during
their timeslices, making it the only entity capable of enqueueing packets into
the NIC.

This is the "global scheduler", but we still need the txtime aware qdisc. For
that, we'll modify tbs to accommodate the feedback from this thread. More below.

>
> The qdiscs which are attached to those traffic classes can be anything
> including:
>
>  - Simple feed through (Applications are time contraints aware and set the
>    exact schedule). qdisc has admission control.

This will be provided by the tbs qdisc. It will still provide a txtime sorted
list and hw offload, but now there will be a per-socket option that tells the
qdisc if the per-packet timestamp is the txtime (i.e. explicit mode, as you've
called it) or a deadline. The drop_if_late flag will be removed.

When in explicit mode, packets from that socket are dequeued from the qdisc
during its time slice if their [(txtime - delta) < now].

>
>  - Deadline aware qdisc to handle e.g. A/V streams. Applications are aware
>    of time constraints and provide the packet deadline. qdisc has admission
>    control. This can be a simple first comes, first served scheduler or
>    something like EDF which allows optimized utilization. The qdisc sets
>    the TX time depending on the deadline and feeds into the root.

This will be provided by tbs if the socket which is transmitting packets is
configured for deadline mode.

For the deadline -> txtime conversion, what I have in mind is: when dequeue is
called tbs will just change the skbuff's timestamp from the deadline to 'now'
(i.e. as soon as possible) and dequeue the packet. Would that be enough or
should we use the delta parameter of the qdisc on this case add make [txtime =
now + delta]? The only benefit of doing so would be to provide a configurable
'fudge' factor.

Another question for this mode (but perhaps that applies to both modes) is, what
if the qdisc misses the deadline for *any* reason? I'm assuming it should drop
the packet during dequeue.

Putting it all together, we end up with:

1) a new txtime aware qdisc, tbs, to be used per queue. Its cli will look like:
$ tc qdisc add (...) tbs clockid CLOCK_REALTIME delta 150000 offload sorting

2) a new cmsg-interface for setting a per-packet timestamp that will be used
either as a txtime or as deadline by tbs (and further the NIC driver for the
offlaod case): SCM_TXTIME.

3) a new socket option: SO_TXTIME. It will be used to enable the feature for a
socket, and will have as parameters a clockid and a txtime mode (deadline or
explicit), that defines the semantics of the timestamp set on packets using
SCM_TXTIME.

4) a new #define DYNAMIC_CLOCKID 15 added to include/uapi/linux/time.h .

5) a new schedule-aware qdisc, 'tas' or 'taprio', to be used per port. Its cli
will look like what was proposed for taprio (base time being an absolute 
timestamp).

If we all agree with the above, we will start by closing on 1-4 asap and will
focus on 5 next.

How does that sound?

Thanks,
Jesus

>
>  - FIFO/PRIO/XXX for general traffic. Applications do not know anything
>    about timing constraints. These qdiscs obviously have neither admission
>    control nor do they set a TX time.  The root qdisc just pulls from there
>    when the assigned time slot is due or if it (optionally) decides to use
>    underutilized time slots from other classes.
>
>  - .... Add your favourite scheduling mode(s).
>
> Thanks,
>
>       tglx
>

Re: [RFC v3 net-next 13/18] net/sched: Introduce the TBS Qdisc

Reply via email to