On 9/6/24 03:25, Jakub Kicinski wrote:
For the driver -- let me flip the question around -- what do you expect
the locking scheme to be in case of channel count change? Alternatively
we could just expect the driver to take netdev->lock around the
appropriate section of code and we'd do:
void net_shaper_set_real_num_tx_queues(struct net_device *dev, ...)
{
...
if (!READ_ONCE(dev->net_shaper_hierarchy))
return;
lockdep_assert_held(dev->lock);
...
}
In the IAVF case that will be problematic, as AFAICS the channel reconf
is done by 2 consecutive async task, the first task - iavf_reset_task -
changes the actual number of channels freeing/allocating the driver
resources and the 2nd one - iavf_finish_config - notify the stack
issuing netif_set_real_num_tx_queues(). iavf_reset_task can't easily
wait for iavf_finish_config due to locking order.
I had a look at iavf, and there is no relevant locking around the queue
count check at all, so that doesn't help.
Yep, that is racy.
Acquiring dev->lock around set_channel() will not be enough: some driver
change the channels number i.e. when enabling XDP.
Indeed, trying to lock before calling the driver would be both a huge
job and destined to fail.
I think/fear we need to replace the dev->lock with the rtnl lock to
solve the race for good.
Maybe :( I think we need *an* answer for:
- how we expect the driver to protect itself (assuming that the racy
check in iavf_verify_handle() actually serves some purpose, which
may not be true);
- how we ensure consistency of core state (no shapers for queues which
don't exist, assuming we agree having shapers for queues which
don't exist is counter productive).
I agree we must delete shapers on removed/deleted queues. The
driver/firmware could reuse the same H/W resources for a different VF
and such queue must start in the new VF with a default (no shaping) config.
Reverting back to rtnl_lock for all would be sad, the scheme of
expecting the driver to take netdev->lock could work?
It's the model we effectively settled on in devlink.
Core->driver callbacks are always locked by the core,
for driver->core calls driver should explicitly take the lock
(some wrappers for lock+op+unlock are provided).
I think/guess/hope the following could work:
- the driver wraps the h/w resources reconfiguration and
netif_set_real_num_tx_queues() with dev->lock. In the iavf case, that
means 2 separate critical sections: in iavf_reset_task() and in
iavf_finish_config().
- the core, under dev->lock, checks vs real_num_tx_queues and call the
shaper ops
- the iavf shaper callbacks would still need to check the queue id vs
the current allocated hw resource number as the shapers ops could run
in-between the 2 mentioned critical sections. The iavf driver could
still act consistently with the core:
- if real_num_tx_queues < qid < current_allocated_hw_resources
set the shaper,
- if current_allocated_hw_resources < qid < real_num_tx_queues do
nothing and return success
In both the above scenarios, real_num_tx_queues will be set to
current_allocated_hw_resources soon by the upcoming
iavf_finish_config(), the core will update the hierarchy accordingly,
the status will be consistent.
I think the code should be more clear, let me try to share it ASAP (or
please block me soon ;)
Thanks,
Paolo