> >> > currently all the device driver call
> >> > netif_tx_start_all_queues(dev) on open to W/A this issue. which is
> >> > strange since only real_num_tx_queues are active.
> >>
> >> You could also argue that netif_tx_start_all_queues() should only
> >> enable the real_num_tx_queues.
> >> [Although that would obviously cause all drivers to reach the
> >> 'problem' you're currently fixing].
> >
> > Yep. Basically what I pointed out.
> >
> > It seems inconsistent to have loops using num_tx_queues, and others
> > using real_num_tx_queues.
> >
> > Instead of 'fixing' one of them, we should take a deeper look, even if
> > the change looks fine.
> >
> > num_tx_queues should be used in code that runs once, like
> > netdev_lockdep_set_classes(), but other loops should probably use
> > real_num_tx_queues.
> >
> > Anyway all these changes should definitely target net-next, not net
> > tree.
> >
> 
> But for the long term, you have a point.
> We will consider a deeper fix for net-next as you suggested, and drop this
> temporary fix.

I think we've actually managed to hit an issue with qede [& modified bnx2x]
due to netif_tx_start_all_queues() starting all Tx-queues - 
While reducing the number of channels on an interface driver reloads
following which the xmit function receives an SKB using a too-high txq.

Investigation seem to indicate that some TCP traffic arrived during the
reload, got enqueued on the qdisc with high txq and then got transmitted
as-is after re-enabling tx.
[Removing the modulo from bnx2x's select_queue() lead to same issue.]

Reply via email to