On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer
> Well, I would prefer you to implement those. I just did a quick
> implementation (its trivially easy) so I have something to benchmark
> with. The performance boost is quite impressive!
sounds good, but wait
> One reason I didn't "just" send a patch, is that Edward so-fare only
> implemented netif_receive_skb_list() and not napi_gro_receive_list().
sfc does't support gro?! doesn't make sense.. Edward?
> And your driver uses napi_gro_receive(). This sort-of disables GRO for
> your driver, which is not a choice I can make. Interestingly I get
> around the same netperf TCP_STREAM performance.
Same TCP performance
with GRO and no rx-batching
without GRO and yes rx-batching
is by far not intuitive result to me unless both these techniques
mostly serve to eliminate lots of instruction cache misses and the
TCP stack is so much optimized that if the code is in the cache,
going through it once with 64K byte GRO-ed packet is like going
through it ~40 (64K/1500) times with non GRO-ed packets.
What's the baseline (with GRO and no rx-batching) number on your setup?
> I assume we can get even better perf if we "listify" napi_gro_receive.
yeah, that would be very interesting to get there