On 9/5/24 20:02, Paolo Abeni wrote:
On 9/5/24 03:33, Jakub Kicinski wrote:
On Wed, 4 Sep 2024 15:53:39 +0200 Paolo Abeni wrote:
+ net_shaper_set_real_num_tx_queues(dev, txq);
+
dev_qdisc_change_real_num_tx(dev, txq);
dev->real_num_tx_queues = txq;
The dev->lock has to be taken here, around those three lines,
and then set / group must check QUEUE ids against
dev->real_num_tx_queues, no? Otherwise the work
net_shaper_set_real_num_tx_queues() does is prone to races?
Yes, I think such race exists, but I'm unsure that tacking the lock
around the above code will be enough.
i.e. if the relevant devices has 16 channel queues the set() races with
a channel reconf on different CPUs:
CPU 1 CPU 2
set_channels(8)
driver_set_channel()
// actually change the number of queues to
// 8, dev->real_num_tx_queues is still 16
// dev->lock is not held yet because the
// driver still has to call
// netif_set_real_num_tx_queues()
set(QUEUE_15,...)
// will pass validation
// but queue 15 does not
// exist anymore
Acquiring dev->lock around set_channel() will not be enough: some driver
change the channels number i.e. when enabling XDP.
I think/fear we need to replace the dev->lock with the rtnl lock to
solve the race for good.
I forgot to mention there is another, easier, alternative: keep the max
queue id check in the drivers. The driver will have to acquire and held
in the shaper callbacks the relevant driver-specific lock - 'crit_lock',
in the iavf case.
Would you be ok with such 2nd option?
Side note: I think the iavf should have to acquire such lock in the
callbacks no matter what or access/write to the rings info could be racy.
Thanks,
Paolo