2016-09-16 12:53 GMT-07:00 Eric Dumazet <eric.duma...@gmail.com>:
> On Fri, 2016-09-16 at 10:57 -0700, Michael Ma wrote:
>> This is actually the problem - if flows from different RX queues are
>> switched to the same RX queue in IFB, they'll use different processor
>> context with the same tasklet, and the processor context of different
>> tasklets might be the same. So multiple tasklets in IFB competes for
>> the same core when queue is switched.
>> The following simple fix proved this - with this change even switching
>> the queue won't affect small packet bandwidth/latency anymore:
>> in ifb.c:
>> - struct ifb_q_private *txp = dp->tx_private +
>> + struct ifb_q_private *txp = dp->tx_private +
>> (smp_processor_id() % dev->num_tx_queues);
>> This should be more efficient since we're not sending the task to a
>> different processor, instead we try to queue the packet to an
>> appropriate tasklet based on the processor ID. Will this cause any
>> packet out-of-order problem? If packets from the same flow are queued
>> to the same RX queue due to RSS, and processor affinity is set for RX
>> queues, I assume packets from the same flow will end up in the same
>> core when tasklet is scheduled. But I might have missed some uncommon
>> cases here... Would appreciate if anyone can provide more insights.
> Wait, don't you have proper smp affinity for the RX queues on your NIC ?
> ( Documentation/networking/scaling.txt RSS IRQ Configuration )
Yes - what I was trying to say is that this change will be more
efficient than using smp_call_function_single() to schedule the
tasklet to a different processor.
RSS IRQ should be set properly already. The issue here is that I'll
need to switch the queue mapping for NIC RX to a different TXQ on IFB,
which allows me to classify the flows at the IFB TXQ layer and avoid
qdisc lock contention.
When that switch happens, ideally processor core shouldn't be switched
because all the thread context isn't changed. The work in tasklet
should be scheduled to the same processor as well. That's why I tried
this change. Also conceptually IFB is a software device which should
be able to schedule its workload independent from how NIC is
configured for the interrupt handling.
> A driver ndo_start_xmit() MUST use skb_get_queue_mapping(skb), because
> the driver queue is locked before ndo_start_xmit()) (for non
> NETIF_F_LLTX drivers at least)
Thanks a lot for pointing out this! I was expecting this kind of
guidance... Then the options would be:
1. Use smp_call_function_single() to schedule the tasklet to a core
statically mapped to the IFB TXQ, which is very similar to how TX/RX
IRQ is configured.
2. As you suggested below add some additional action to do the
rescheduling before entering IFB - for example when receiving the
packet we could just use RSS to redirect to the desired RXQ, however
this doesn't seem to be easy, especially compared with the way how
mqprio chooses the queue. The challenge here is that IFB queue
selection is based on queue_mapping when skb arrives at IFB and core
selection is based on RXQ on NIC and so it's also based on
queue_mapping when skb arrives at NIC. Then these two queue_mappings
must be the same so that there is no core conflict of processing two
TXQs of IFB. Then this essentially means we have to change queue
mapping of the NIC on the receiver side which can't be achieved using
> In case of ifb, __skb_queue_tail(&txp->rq, skb); could corrupt the skb
> In any case, you could have an action to do this before reaching IFB.