Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

Joel Wirāmu Pauling Tue, 02 Sep 2014 03:06:19 -0700

https://wikidevi.com/wiki/Securifi_Almond%2B


On 2 September 2014 22:05, Joel Wirāmu Pauling <[email protected]> wrote:
> On a somewhat related note - I've just received my NZ/AU Region
> Almond+ which is an arm9 Dual core router based on the Cortina CSC SoC
> :
>
> https://www.cortina-systems.com/product/digital-home-processors/16-products/996-cs7542-cs7522
>
> More details :
>
> On 2 September 2014 21:27, Jonathan Morton <[email protected]> wrote:
>>
>> On 2 Sep, 2014, at 1:14 am, Aaron Wood wrote:
>>
>>>> For the purposes of shaping, the CPU shouldn't need to touch the majority 
>>>> of the payload - only the headers, which are relatively small.  The bulk 
>>>> of the payload should DMA from one NIC to RAM, then DMA back out of RAM to 
>>>> the other NIC.  It has to do that anyway to route them, and without 
>>>> shaping there'd be more of them to handle.  The difference might be in the 
>>>> data structures used by the shaper itself, but I think those are also 
>>>> reasonably compact.  It doesn't even have to touch userspace, since it's 
>>>> not acting as the endpoint as my PowerBook was during my tests.
>>>
>>> In an ideal case, yes.  But is that how this gets managed?  (I have no 
>>> idea, I'm certainly not a kernel developer).
>>
>> It would be monumentally stupid to integrate two GigE MACs onto an SoC, and 
>> then to call it a "network processor", without adequate DMA support.  I 
>> don't think Atheros are that stupid.
>>
>> Here's a more detailed datasheet:
>>         
>> http://pdf.datasheetarchive.com/indexerfiles/Datasheets-SW6/DSASW00118777.pdf
>>
>> "Another memory factor is the ability to support multiple I/O operations in 
>> parallel via the WNPU's various ports. The on-chip SRAM in AR7100 WNPUs has 
>> 5 ports that enable simultaneous access to and from five sources: the two 
>> gigabit Ethernet ports, the PCI port, the USB 2.0 port and the MIPS 
>> processor."
>>
>> It's a reasonable question, however, whether the driver uses that support 
>> properly.  Mainline Linux kernel code seems to support the SoC but not the 
>> Ethernet; if it were just a minor variant of some other Atheros hardware, 
>> I'd have expected to see it integrated into one of the existing drivers.  Or 
>> maybe it is, and my greps just aren't showing it.
>>
>> At minimum, however, there are MMIO ranges reported for each MAC during 
>> OpenWRT's boot sequence.  That's where the ring buffers are.  The most the 
>> CPU has to do is read each packet from RAM and write it into those buffers, 
>> or vice versa for receive - I think that's what my PowerBook has to do.  
>> Ideally, a bog-standard DMA engine would take over that simple duty.  Either 
>> way, that's something that has to happen whether it's shaped or not, so it's 
>> unlikely to be our problem.
>>
>> The same goes for the wireless MACs, incidentally.  These are standard ath9k 
>> mini-PCI cards, and the drivers *are* in mainline.  There shouldn't be any 
>> surprises with them.
>>
>>> If the packet data is getting moved about from buffer to buffer (for 
>>> instance to do the htb calculations?) could that substantially change the 
>>> processing load?
>>
>> The qdiscs only deal with packet and socket headers, not the full packet 
>> data.  Even then, they largely pass pointers around, inserting the headers 
>> into linked lists rather than copying them into arrays.  I believe a lot of 
>> attention has been directed at cache-friendliness in this area, and the MIPS 
>> caches are of conventional type.
>>
>>>> Which brings me back to the timers, and other items of black magic.
>>>
>>> Which would point to under-utilizing the processor core, while still having 
>>> high load? (I'm not seeing that, I'm curious if that would be the case).
>>
>> It probably wouldn't manifest as high system load.  Rather, poor timer 
>> resolution or latency would show up as excessive delays between packets, 
>> during which the CPU is idle.  The packet egress times may turn out to be 
>> quantised - that would be a smoking gun, if detectable.
>>
>>>> Incidentally, transfer speed benchmarks involving wireless will certainly 
>>>> be limited by the wireless link.  I assume that's not a factor here.
>>>
>>> That's the usual suspicion.  But these are RF-chamber, short-range lab 
>>> setups where the radios are running at full speed in perfect environments...
>>
>> Sure.  But even turbocharged 'n' gear tops out at 450Mbps signalling, and 
>> much less than that is available even theoretically for TCP/IP throughput.  
>> My point is that you're probably not running *your* tests over wireless.
>>
>>> What this makes me realize is that I should go instrument the cpu stats 
>>> with each of the various operating modes:
>>>
>>> * no shaping, anywhere
>>> * egress shaping
>>> * egress and ingress shaping at various limited levels:
>>>     * 10Mbps
>>>     * 20Mbps
>>>     * 50Mbps
>>>     * 100Mbps
>>
>> Smaller increments at the high end of the range may prove to be useful.  I 
>> would expect the CPU usage to climb nonlinearly (busy-waiting) if there's a 
>> bottleneck in a peripheral device, such as the PCI bus.  The way the kernel 
>> classifies that usage may also be revealing.
>>
>>> Heck, what about running HTB simply from a 1ms timer instead of from a data 
>>> driven timer?
>>
>> That might be what's already happening.  We have to figure out that before 
>> we can work out a solution.
>>
>>  - Jonathan Morton
>>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> [email protected]
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
_______________________________________________
Cerowrt-devel mailing list
[email protected]
https://lists.bufferbloat.net/listinfo/cerowrt-devel

Re: [Cerowrt-devel] [Bloat] Comcast upped service levels -> WNDR3800 can't cope...

Reply via email to