https://wikidevi.com/wiki/Securifi_Almond%2B
On 2 September 2014 22:05, Joel Wirāmu Pauling <[email protected]> wrote: > On a somewhat related note - I've just received my NZ/AU Region > Almond+ which is an arm9 Dual core router based on the Cortina CSC SoC > : > > https://www.cortina-systems.com/product/digital-home-processors/16-products/996-cs7542-cs7522 > > More details : > > On 2 September 2014 21:27, Jonathan Morton <[email protected]> wrote: >> >> On 2 Sep, 2014, at 1:14 am, Aaron Wood wrote: >> >>>> For the purposes of shaping, the CPU shouldn't need to touch the majority >>>> of the payload - only the headers, which are relatively small. The bulk >>>> of the payload should DMA from one NIC to RAM, then DMA back out of RAM to >>>> the other NIC. It has to do that anyway to route them, and without >>>> shaping there'd be more of them to handle. The difference might be in the >>>> data structures used by the shaper itself, but I think those are also >>>> reasonably compact. It doesn't even have to touch userspace, since it's >>>> not acting as the endpoint as my PowerBook was during my tests. >>> >>> In an ideal case, yes. But is that how this gets managed? (I have no >>> idea, I'm certainly not a kernel developer). >> >> It would be monumentally stupid to integrate two GigE MACs onto an SoC, and >> then to call it a "network processor", without adequate DMA support. I >> don't think Atheros are that stupid. >> >> Here's a more detailed datasheet: >> >> http://pdf.datasheetarchive.com/indexerfiles/Datasheets-SW6/DSASW00118777.pdf >> >> "Another memory factor is the ability to support multiple I/O operations in >> parallel via the WNPU's various ports. The on-chip SRAM in AR7100 WNPUs has >> 5 ports that enable simultaneous access to and from five sources: the two >> gigabit Ethernet ports, the PCI port, the USB 2.0 port and the MIPS >> processor." >> >> It's a reasonable question, however, whether the driver uses that support >> properly. Mainline Linux kernel code seems to support the SoC but not the >> Ethernet; if it were just a minor variant of some other Atheros hardware, >> I'd have expected to see it integrated into one of the existing drivers. Or >> maybe it is, and my greps just aren't showing it. >> >> At minimum, however, there are MMIO ranges reported for each MAC during >> OpenWRT's boot sequence. That's where the ring buffers are. The most the >> CPU has to do is read each packet from RAM and write it into those buffers, >> or vice versa for receive - I think that's what my PowerBook has to do. >> Ideally, a bog-standard DMA engine would take over that simple duty. Either >> way, that's something that has to happen whether it's shaped or not, so it's >> unlikely to be our problem. >> >> The same goes for the wireless MACs, incidentally. These are standard ath9k >> mini-PCI cards, and the drivers *are* in mainline. There shouldn't be any >> surprises with them. >> >>> If the packet data is getting moved about from buffer to buffer (for >>> instance to do the htb calculations?) could that substantially change the >>> processing load? >> >> The qdiscs only deal with packet and socket headers, not the full packet >> data. Even then, they largely pass pointers around, inserting the headers >> into linked lists rather than copying them into arrays. I believe a lot of >> attention has been directed at cache-friendliness in this area, and the MIPS >> caches are of conventional type. >> >>>> Which brings me back to the timers, and other items of black magic. >>> >>> Which would point to under-utilizing the processor core, while still having >>> high load? (I'm not seeing that, I'm curious if that would be the case). >> >> It probably wouldn't manifest as high system load. Rather, poor timer >> resolution or latency would show up as excessive delays between packets, >> during which the CPU is idle. The packet egress times may turn out to be >> quantised - that would be a smoking gun, if detectable. >> >>>> Incidentally, transfer speed benchmarks involving wireless will certainly >>>> be limited by the wireless link. I assume that's not a factor here. >>> >>> That's the usual suspicion. But these are RF-chamber, short-range lab >>> setups where the radios are running at full speed in perfect environments... >> >> Sure. But even turbocharged 'n' gear tops out at 450Mbps signalling, and >> much less than that is available even theoretically for TCP/IP throughput. >> My point is that you're probably not running *your* tests over wireless. >> >>> What this makes me realize is that I should go instrument the cpu stats >>> with each of the various operating modes: >>> >>> * no shaping, anywhere >>> * egress shaping >>> * egress and ingress shaping at various limited levels: >>> * 10Mbps >>> * 20Mbps >>> * 50Mbps >>> * 100Mbps >> >> Smaller increments at the high end of the range may prove to be useful. I >> would expect the CPU usage to climb nonlinearly (busy-waiting) if there's a >> bottleneck in a peripheral device, such as the PCI bus. The way the kernel >> classifies that usage may also be revealing. >> >>> Heck, what about running HTB simply from a 1ms timer instead of from a data >>> driven timer? >> >> That might be what's already happening. We have to figure out that before >> we can work out a solution. >> >> - Jonathan Morton >> >> _______________________________________________ >> Cerowrt-devel mailing list >> [email protected] >> https://lists.bufferbloat.net/listinfo/cerowrt-devel _______________________________________________ Cerowrt-devel mailing list [email protected] https://lists.bufferbloat.net/listinfo/cerowrt-devel
