On Thu, Feb 15, 2018 at 3:29 AM, Felix Fietkau <n...@nbd.name> wrote: > On 2018-02-14 16:20, Rosen Penev wrote: >> On Tue, Feb 13, 2018 at 11:10 PM, Felix Fietkau <n...@nbd.name> wrote: >>> On 2018-02-13 23:53, Rosen Penev wrote: >>>> Based on Qualcomm driver. Improves iperf3 throughput by ~20mbps on >>>> transmit on Archer C7v4. >>>> >>>> Signed-off-by: Rosen Penev <ros...@gmail.com> >>>> --- >>>> .../drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c | 14 >>>> +++++++------- >>>> 1 file changed, 7 insertions(+), 7 deletions(-) >>>> >>>> diff --git >>>> a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c >>>> >>>> b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c >>>> index 95682b7641..d32f220178 100644 >>>> --- >>>> a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c >>>> +++ >>>> b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c >>>> @@ -797,11 +797,14 @@ static netdev_tx_t ag71xx_hard_start_xmit(struct >>>> sk_buff *skb, >>>> if (ag71xx_has_ar8216(ag)) >>>> ag71xx_add_ar8216_header(ag, skb); >>>> >>>> - if (skb->len <= 4) { >>>> + dma_cache_sync (NULL, skb->data, skb->len, DMA_TO_DEVICE); >>> The use of dma_cache_sync here makes no sense, since it's the wrong API. >>> Also, effectively it results in the same kind of cache flush as the one >>> that's done by the DMA mapping done later. >> >> From my reading of dma_cache_flush, I agree. However that's the part >> of this patch that gives the biggest speedup. Before this patch, I >> tested just adding that and it worked. I can back this up with >> benchmarks later on. > In my test I quite often encountered big differences in throughput from > minor (and often very much unrelated) changes. I haven't found the real > source of these differences yet, so it's hard to know which changes > really help in the long run. My best guess is that some changes affect > the alignment of critical functions and thus affect the icache footprint > somehow. > > Until I see a reasonable explanation of how this change helps, I'm going > to assume it's the same kind of random change fluctuation that I've seen > so often and NACK this change.
So I just benchmarked this while not connected to anything. Basically, WAN port assigned to LAN interface (it's the same CPU port - eth0) and br-lan removed so that eth0.1 is assigned to LAN. I noticed something bizarre. With the patch applied, I was getting ~150mbps on iperf3 consistently. With the patch not applied, I was getting the same speed, EXCEPT the speed would sometimes bump up to ~250mbps. Adding dma_cache_sync eliminated this boost. I have no explanation for this. Maybe TX queue gets full? Sample result: Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.1.1, port 40080 [ 5] local 192.168.1.2 port 5201 connected to 192.168.1.1 port 40082 [ ID] Interval Transfer Bitrate [ 5] 0.00-1.00 sec 17.0 MBytes 143 Mbits/sec [ 5] 1.00-2.00 sec 18.1 MBytes 152 Mbits/sec [ 5] 2.00-3.00 sec 17.5 MBytes 147 Mbits/sec [ 5] 3.00-4.00 sec 17.6 MBytes 148 Mbits/sec [ 5] 4.00-5.00 sec 17.8 MBytes 149 Mbits/sec [ 5] 5.00-6.00 sec 18.0 MBytes 151 Mbits/sec [ 5] 6.00-7.00 sec 25.5 MBytes 214 Mbits/sec [ 5] 7.00-8.00 sec 30.9 MBytes 259 Mbits/sec [ 5] 8.00-9.00 sec 30.4 MBytes 255 Mbits/sec [ 5] 9.00-10.00 sec 31.1 MBytes 261 Mbits/sec [ 5] 10.00-10.06 sec 1.77 MBytes 250 Mbits/sec > > - Felix _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev