Hi Alan, Thanks for your comments!
> Hi Cristian, > Looking at points 10 and 11 it's good to hear nodes can be dynamically added. Yes, many implementations allow on-the-fly remapping a node from one parent to another one, or simply adding more nodes post-initialization, so it is natural for the API to provide this. > We've been trying to decide the best way to do this for support of qos on > tunnels for > some time now and the existing implementation doesn't allow this so > effectively ruled > out hierarchical queueing for tunnel targets on the output interface. > Having said that, has thought been given to separating the queueing from > being so closely > tied to the Ethernet transmit process ? When queueing on a tunnel for > example we may > be working with encryption. When running with an anti-reply window it is > really much > better to do the QOS (packet reordering) before the encryption. To support > this would > it be possible to have a separate scheduler structure which can be passed > into the > scheduling API ? This means the calling code can hang the structure of > whatever entity > it wishes to perform qos on, and we get dynamic target support > (sessions/tunnels etc). Yes, this is one point where we need to look for a better solution. Current proposal attaches the hierarchical scheduler function to an ethdev, so scheduling traffic for tunnels that have a pre-defined bandwidth is not supported nicely. This question was also raised in VPP, but there tunnels are supported as a type of output interfaces, so attaching scheduling to an output interface also covers the tunnels case. Looks to me that nice tunnel abstractions are a gap in DPDK as well. Any thoughts about how tunnels should be supported in DPDK? What do other people think about this? > Regarding the structure allocation, would it be possible to make the number > of queues > associated with a TC a compile time option which the scheduler would > accommodate ? > We frequently only use one queue per tc which means 75% of the space > allocated at > the queueing layer for that tc is never used. This may be specific to our > implementation > but if other implementations do the same if folks could say we may get a > better idea > if this is a common case. > Whilst touching on the scheduler, the token replenishment works using a > division and > multiplication obviously to cater for the fact that it may be run after > several tc windows > have passed. The most commonly used industrial scheduler simply does a > lapsed on the tc > and then adds the bc. This relies on the scheduler being called within the > tc window > though. It would be nice to have this as a configurable option since it's > much for efficient > assuming the infra code from which it's called can guarantee the calling > frequency. This is probably feedback for librte_sched as opposed to the current API proposal, as the Latter is intended to be generic/implementation-agnostic and therefor its scope far exceeds the existing set of librte_sched features. Btw, we do plan using the librte_sched feature as the default fall-back when the HW ethdev is not scheduler-enabled, as well as the implementation of choice for a lot of use-cases where it fits really well, so we do have to continue evolve and improve librte_sched feature-wise and performance-wise. > I hope you'll consider these points for inclusion into a future road map. > Hopefully in the > future my employer will increase the priority of some of the tasks and a PR > may appear > on the mailing list. > Thanks, > Alan.