Re: ARM target not boot after remap memory
On Mon, Sep 13, 2010 at 04:30:17PM +0200, Robin Theunis wrote: I have compiled the kernel with early printk on and debug_LL, It still doesn't nothing after that line. Please don't top-post. Did you add earlyprintk to your kernel command line like the EARLY_PRINTK menuconfig help text suggests? arch/arm/mach-at91/include/mach/debug-macro.S also suggests the LL debug output goes to AT91 debug unit not to normal UART (not sure about this, I don't know much about AT91). Did you try to dump __log_buf using JTAG? HTH Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 100Mbit ethernet performance on embedded devices
On Thu, Aug 20, 2009 at 02:56:49PM +0200, Johannes Stezenbach wrote: On Wed, Aug 19, 2009 at 04:35:34PM +0100, Jamie Lokier wrote: Johannes Stezenbach wrote: TCP RX ~70Mbit/sec (iperf -s on SoC, iperf -c on destop PC) TCP TX ~56Mbit/sec (iperf -s on destop PC, iperf -c o SoC) The CPU load during the iperf test is around 1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec. The kernel used in these measurements does not have iptables support, I think packet filtering will slow it down noticably, but I didn't actually try. The ethernet driver uses NAPI, but it doesn't seem to be a win judging from the irq/sec number. You should see far fewer interrupts if NAPI was working properly. Rather than NAPI not being a win, it looks like it's not active at all. 7500/sec is close to the packet rate, for sending TCP with full-size ethernet packages over a 100Mbit ethernet link. From debug output I can see that NAPI works in principle, however the timing seems to be such that -poll() almost always completes before the next packet is received. I followed the NAPI_HOWTO.txt which came with the 2.6.20 kernel. The delay between irq - netif_rx_schedule() - NET_RX_SOFTIRQ - -poll() doesn't seem to be long enough. But of course my understanding of NAPI is very limited, probably I missed something... It would've been nice to get a comment on this. Yeah I know, old kernel, non-mainline driver... On this platform NAPI seems to be a win when receiving small packets, but not for a single max-bandwidth TCP stream. The folks at stlinux.com seem to be using a dedicated hw timer to delay the NAPI poll() calls: http://www.stlinux.com/drupal/kernel/network/stmmac-optimizations This of course adds some latency to the packet processing, however in the single TCP stream case this wouldn't matter. Thanks, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 100Mbit ethernet performance on embedded devices
On Wed, Aug 19, 2009 at 04:35:34PM +0100, Jamie Lokier wrote: Johannes Stezenbach wrote: TCP RX ~70Mbit/sec (iperf -s on SoC, iperf -c on destop PC) TCP TX ~56Mbit/sec (iperf -s on destop PC, iperf -c o SoC) The CPU load during the iperf test is around 1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec. The kernel used in these measurements does not have iptables support, I think packet filtering will slow it down noticably, but I didn't actually try. The ethernet driver uses NAPI, but it doesn't seem to be a win judging from the irq/sec number. You should see far fewer interrupts if NAPI was working properly. Rather than NAPI not being a win, it looks like it's not active at all. 7500/sec is close to the packet rate, for sending TCP with full-size ethernet packages over a 100Mbit ethernet link. From debug output I can see that NAPI works in principle, however the timing seems to be such that -poll() almost always completes before the next packet is received. I followed the NAPI_HOWTO.txt which came with the 2.6.20 kernel. The delay between irq - netif_rx_schedule() - NET_RX_SOFTIRQ - -poll() doesn't seem to be long enough. But of course my understanding of NAPI is very limited, probably I missed something... What I'm interested in are some numbers for similar hardware, to find out if my hardware and/or ethernet driver can be improved, or if the CPU will always be the limiting factor. I have a SoC with a 166MHz ARMv4 (ARM7TDMI I think, but I'm not sure), and an external RTL8139 100Mbit ethernet chip over the SoC's PCI bus. It gets a little over 80Mbit/s actual data throughput in both directions, running a simple FTP client. I found one interesting page which defines network driver performance in terms of CPU MHz per Mbit. http://www.stlinux.com/drupal/node/439 I can't really tell from their table how big a win HW csum is, but what they call interrupt mitigation optimisations (IOW: working NAPI) seems important. (compare the values for STx7105) If some has an embedded platform with 100Mbit ethernet where they can switch HW checksum via ethtool and benchmark both under equal conditions, that would be very interesting. Thanks Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
100Mbit ethernet performance on embedded devices
Hi, a while ago I was working on a SoC with 200MHz ARM926EJ-S CPU and integrated 100Mbit ethernet core, connected on internal (fast) memory bus, with DMA. With iperf I measured: TCP RX ~70Mbit/sec (iperf -s on SoC, iperf -c on destop PC) TCP TX ~56Mbit/sec (iperf -s on destop PC, iperf -c o SoC) The CPU load during the iperf test is around 1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec. The kernel used in these measurements does not have iptables support, I think packet filtering will slow it down noticably, but I didn't actually try. The ethernet driver uses NAPI, but it doesn't seem to be a win judging from the irq/sec number. The kernel was an ancient 2.6.20. I tried hard, but I couldn't find any performance figures for comparison. (All performance figures I found refer to 1Gbit or 10Gbit server type systems.) What I'm interested in are some numbers for similar hardware, to find out if my hardware and/or ethernet driver can be improved, or if the CPU will always be the limiting factor. I'd also be interested to know if hardware checksumming support would improve throughput noticably in such a system, or if it is only useful for 1Gbit and above. Did anyone actually manage to get close to 100Mbit/sec with similar CPU resources? TIA, Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: New fast(?)-boot results on ARM
On Fri, Aug 14, 2009 at 10:43:05PM +0200, Robert Schwebel wrote: On Fri, Aug 14, 2009 at 10:04:57PM +0200, Denys Vlasenko wrote: r...@thebe:~$ microcom | ptx_ts U-Boot 2.0.0-rc9 Now that microcom is in Debian sid (thanks!), where can I find ptx_ts? It seems to be quite useful. [ 0.874559] 0.003967 Hit any key to stop autoboot: 0 boot loader is not fast. considering its simple task, it can be made faster. Yup, will check. Almost 1 s seems really long. I'm working on a SoC with a 200MHz ARM926EJ-S. We managed to get to 1.5sec from power-on to starting init. The main difference to your platform seems to be that we use NOR flash. The kernel is not optimized, it still has some debug options turned on and is used during development. (however, the 1.5sec is with quiet) The root fs is cramfs. The kernel version is 2.6.20. For u-boot we enabled the D-cache which gave a decent speed up (on ARM926EJ-S this requires one to set up page tables and enable MMU, but it's not that difficult). I don't have the numbers here but I think it still takes ~300ms in u-boot, and ~1.2s for the kernel boot. [ 1.326621] 0.452062 loaded zImage from /dev/nand0.kernel.bb with size 1679656 [ 2.009996] 0.683375 Uncompressing Linux... done, booting the kernel. [ 2.416999] 0.407003 Linux version 2.6.31-rc4-g056f82f-dirty (s...@octopus) (gcc version 4.3.2 (OSELAS.Toolchain-1.99.3) ) #1 PREEMPT Thu Aug 6 08:37:19 CEST 2009 Other people already commented on this (kernel is too big) Not sure (the kernel is already customized for the board), but I'll take a look again. We are booting an uncomressed kernel (~2.8MB). Uncompressing (running the uncompressor XIP in NOR flash) took ~0.5s longer than copying 2.8MB from flash to RAM. BTW, we are using uImage and set verify=no in u-boot. We use u-boot-1.3.0. [ 5.082616] 0.007992 RPC: Registered tcp transport module. [ 5.605159] 0.522543 eth0: config: auto-negotiation on, 100FDX, 100HDX, 10FDX, 10HDX. What is happening here? Waiting for eth link negotiation? [ 6.602621] 0.997462 IP-Config: Complete: [ 6.606638] 0.004017 device=eth0, addr=192.168.23.197, mask=255.255.0.0, gw=192.168.23.2, [ 6.614588] 0.007950 host=192.168.23.197, domain=, nis-domain=(none), [ 6.618652] 0.004064 bootserver=192.168.23.2, rootserver=192.168.23.2, rootpath= Well, this ~1 second is not really kernel's fault, it's DHCP delay. But, do you need to do it at this moment? You do not seem to be using networking filesystems. You can run DHCP client in userspace. The board has ip autoconfig configured in, because we also use tftp/nfs boot for development. But it had been disabled on the commandline: ip=192.168.23.197:192.168.23.2:192.168.23.2:255.255.0.0::: That shouldn't do dhcp, right? Try to boot with eth cable unplugged, see if it hangs in IP-config. If it were doing static configuration it would be faster. However, unless you need ethernet to boot (NFS root) I'd suggest doing eth config in userspace. [ 7.137924] 0.059316 starting udev [ 7.147925] 0.010001 mounting tmpfs at /dev [ 7.182299] 0.034374 creating static nodes [ 7.410613] 0.228314 starting udevd...done [ 8.811097] 1.400484 waiting for devices...done And suddenly devtmpfs sounds like a good idea ;-) We use static device nodes during boot, and later setup busybox mdev for hotplug. Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation for activating a deferred module init in the kernel
On Wed, Jun 18, 2008, Stefan Richter wrote: Johannes Stezenbach wrote: I think the USB bus enumeration can take significant time: recognize a device is connected, turn on bus power, try to read descriptors (bus powered devices might be slow to respond after power up). And this will happen even with drivers_autoprobe == 0, right? Probably... I don't know which particular steps happen in the USB core before upper layer drivers are bound. Not binding the [eou]hci-hcd PCI driver would certainly be more effective. Well, in embedded systems you often don't have a PCI bus, but platform devices. Maybe it's as simple as delaying the USB platform_device_register() call, I don't know. Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommendation for activating a deferred module init in the kernel
On Wed, Jun 18, 2008 at 12:48:27AM +0200, Stefan Richter wrote: On Tue, 17 June 2008 12:55:31 -0700, Tim Bird wrote: On Tue, 17 Jun 2008 11:28:29 -0700, Tim Bird wrote: | One of the main sub-systems that we defer initialization of this | way is USB, and this saves quite a bit of time. (Of course the | same, or slightly more CPU cycles are eventually used during | bootup time. But this lets us get to user space quicker so we | can start user-visible applications faster.) What if you don't defer module initialization, but merely device probing? ... If you set /sys/bus/foo/drivers_autoprobe to 0 (default is 1), then a /sys/bus/foo/drivers/bar will not be bound to devices. You can trigger driver--device binding later per device by writing a device's bus ID into /sys/bus/foo/drivers/bar/bind, or by writing into /sys/bus/foo/drivers_probe (I guess; I only used the per-device way so far). I think the USB bus enumeration can take significant time: recognize a device is connected, turn on bus power, try to read descriptors (bus powered devices might be slow to respond after power up). And this will happen even with drivers_autoprobe == 0, right? OTOH I think just calling the module init function when no devices are present on the bus doesn't need much time. If you could delay the enumeration it would not be neccessary to mess with drivers_autoprobe. However, I don't know enough about USB so I don't know how to do it... Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: cross-compiling alternatives (was Re: [PATCH 0/1] Embedded Maintainer(s)...)
On Fri, Jun 13, 2008, Tim Bird wrote: YMMV. I put some of the resources and info I found at: http://elinux.org/Debugging_Makefiles There is also remake, which is A patched GNU make with a debuger, better tracing and error reporting (based on GNU make 3.80). Development seems to have stopped, though. http://sourceforge.net/projects/bashdb/ Johannes -- To unsubscribe from this list: send the line unsubscribe linux-embedded in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html