Re: [Intel-wired-lan] [PATCH v6 net-next 07/15] net-shapers: implement shaper cleanup on queue deletion

Paolo Abeni Fri, 06 Sep 2024 07:25:47 -0700

On 9/6/24 03:25, Jakub Kicinski wrote:

For the driver -- let me flip the question around -- what do you expect
the locking scheme to be in case of channel count change? Alternatively
we could just expect the driver to take netdev->lock around the
appropriate section of code and we'd do:


void net_shaper_set_real_num_tx_queues(struct net_device *dev, ...)
{
        ...
        if (!READ_ONCE(dev->net_shaper_hierarchy))
                return;

        lockdep_assert_held(dev->lock);
        ...
}

In the IAVF case that will be problematic, as AFAICS the channel reconfis done by 2 consecutive async task, the first task - iavf_reset_task -changes the actual number of channels freeing/allocating the driverresources and the 2nd one - iavf_finish_config - notify the stackissuing netif_set_real_num_tx_queues(). iavf_reset_task can't easilywait for iavf_finish_config due to locking order.

I had a look at iavf, and there is no relevant locking around the queue
count check at all, so that doesn't help.


Yep, that is racy.

Acquiring dev->lock around set_channel() will not be enough: some driver
change the channels number i.e. when enabling XDP.


Indeed, trying to lock before calling the driver would be both a huge
job and destined to fail.

I think/fear we need to replace the dev->lock with the rtnl lock to
solve the race for good.


Maybe :( I think we need *an* answer for:
  - how we expect the driver to protect itself (assuming that the racy
    check in iavf_verify_handle() actually serves some purpose, which
    may not be true);
  - how we ensure consistency of core state (no shapers for queues which
    don't exist, assuming we agree having shapers for queues which
    don't exist is counter productive).

I agree we must delete shapers on removed/deleted queues. Thedriver/firmware could reuse the same H/W resources for a different VFand such queue must start in the new VF with a default (no shaping) config.

Reverting back to rtnl_lock for all would be sad, the scheme of
expecting the driver to take netdev->lock could work?
It's the model we effectively settled on in devlink.
Core->driver callbacks are always locked by the core,
for driver->core calls driver should explicitly take the lock
(some wrappers for lock+op+unlock are provided).


I think/guess/hope the following could work:

- the driver wraps the h/w resources reconfiguration andnetif_set_real_num_tx_queues() with dev->lock. In the iavf case, thatmeans 2 separate critical sections: in iavf_reset_task() and iniavf_finish_config().

- the core, under dev->lock, checks vs real_num_tx_queues and call theshaper ops

- the iavf shaper callbacks would still need to check the queue id vsthe current allocated hw resource number as the shapers ops could runin-between the 2 mentioned critical sections. The iavf driver couldstill act consistently with the core:


  - if real_num_tx_queues < qid < current_allocated_hw_resources
    set the shaper,
  - if current_allocated_hw_resources < qid < real_num_tx_queues do
    nothing and return success

In both the above scenarios, real_num_tx_queues will be set tocurrent_allocated_hw_resources soon by the upcomingiavf_finish_config(), the core will update the hierarchy accordingly,the status will be consistent.

I think the code should be more clear, let me try to share it ASAP (orplease block me soon ;)


Thanks,

Paolo

Re: [Intel-wired-lan] [PATCH v6 net-next 07/15] net-shapers: implement shaper cleanup on queue deletion

Reply via email to