Re: ARM target not boot after remap memory

2010-09-13 Thread Johannes Stezenbach
On Mon, Sep 13, 2010 at 04:30:17PM +0200, Robin Theunis wrote:
 
 I have compiled  the kernel with early printk on and debug_LL, It still
 doesn't nothing after that line.

Please don't top-post.

Did you add earlyprintk to your kernel command line
like the EARLY_PRINTK menuconfig help text suggests?

arch/arm/mach-at91/include/mach/debug-macro.S also suggests
the LL debug output goes to AT91 debug unit not to normal UART
(not sure about this, I don't know much about AT91).
Did you try to dump __log_buf using JTAG?


HTH
Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 100Mbit ethernet performance on embedded devices

2009-08-28 Thread Johannes Stezenbach
On Thu, Aug 20, 2009 at 02:56:49PM +0200, Johannes Stezenbach wrote:
 On Wed, Aug 19, 2009 at 04:35:34PM +0100, Jamie Lokier wrote:
  Johannes Stezenbach wrote:
   
 TCP RX ~70Mbit/sec  (iperf -s on SoC, iperf -c on destop PC)
 TCP TX ~56Mbit/sec  (iperf -s on destop PC, iperf -c o SoC)
   
   The CPU load during the iperf test is around
   1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec.
   
   The kernel used in these measurements does not have iptables
   support, I think packet filtering will slow it down noticably,
   but I didn't actually try.  The ethernet driver uses NAPI,
   but it doesn't seem to be a win judging from the irq/sec number.
  
  You should see far fewer interrupts if NAPI was working properly.
  Rather than NAPI not being a win, it looks like it's not active at
  all.
  
  7500/sec is close to the packet rate, for sending TCP with
  full-size ethernet packages over a 100Mbit ethernet link.
 
 From debug output I can see that NAPI works in principle, however
 the timing seems to be such that -poll() almost always completes
 before the next packet is received.  I followed the NAPI_HOWTO.txt
 which came with the 2.6.20 kernel.  The delay between irq -
 netif_rx_schedule() - NET_RX_SOFTIRQ -  -poll()  doesn't seem
 to be long enough.  But of course my understanding of NAPI is
 very limited, probably I missed something...

It would've been nice to get a comment on this.  Yeah I know,
old kernel, non-mainline driver...

On this platform NAPI seems to be a win when receiving small packets,
but not for a single max-bandwidth TCP stream.  The folks at
stlinux.com seem to be using a dedicated hw timer to delay
the NAPI poll() calls:
http://www.stlinux.com/drupal/kernel/network/stmmac-optimizations

This of course adds some latency to the packet processing,
however in the single TCP stream case this wouldn't matter.


Thanks,
Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 100Mbit ethernet performance on embedded devices

2009-08-20 Thread Johannes Stezenbach
On Wed, Aug 19, 2009 at 04:35:34PM +0100, Jamie Lokier wrote:
 Johannes Stezenbach wrote:
  
TCP RX ~70Mbit/sec  (iperf -s on SoC, iperf -c on destop PC)
TCP TX ~56Mbit/sec  (iperf -s on destop PC, iperf -c o SoC)
  
  The CPU load during the iperf test is around
  1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec.
  
  The kernel used in these measurements does not have iptables
  support, I think packet filtering will slow it down noticably,
  but I didn't actually try.  The ethernet driver uses NAPI,
  but it doesn't seem to be a win judging from the irq/sec number.
 
 You should see far fewer interrupts if NAPI was working properly.
 Rather than NAPI not being a win, it looks like it's not active at
 all.
 
 7500/sec is close to the packet rate, for sending TCP with
 full-size ethernet packages over a 100Mbit ethernet link.

From debug output I can see that NAPI works in principle, however
the timing seems to be such that -poll() almost always completes
before the next packet is received.  I followed the NAPI_HOWTO.txt
which came with the 2.6.20 kernel.  The delay between irq -
netif_rx_schedule() - NET_RX_SOFTIRQ -  -poll()  doesn't seem
to be long enough.  But of course my understanding of NAPI is
very limited, probably I missed something...

  What I'm interested in are some numbers for similar hardware,
  to find out if my hardware and/or ethernet driver can be improved,
  or if the CPU will always be the limiting factor.
 
 I have a SoC with a 166MHz ARMv4 (ARM7TDMI I think, but I'm not sure),
 and an external RTL8139 100Mbit ethernet chip over the SoC's PCI bus.
 
 It gets a little over 80Mbit/s actual data throughput in both
 directions, running a simple FTP client.

I found one interesting page which defines network driver performance
in terms of CPU MHz per Mbit.
http://www.stlinux.com/drupal/node/439

I can't really tell from their table how big a win HW csum is, but
what they call interrupt mitigation optimisations (IOW: working NAPI)
seems important.  (compare the values for STx7105)

If some has an embedded platform with 100Mbit ethernet where they can switch
HW checksum via ethtool and benchmark both under equal conditions, that
would be very interesting.


Thanks
Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


100Mbit ethernet performance on embedded devices

2009-08-19 Thread Johannes Stezenbach
Hi,

a while ago I was working on a SoC with 200MHz ARM926EJ-S CPU
and integrated 100Mbit ethernet core, connected on internal
(fast) memory bus, with DMA.  With iperf I measured:

  TCP RX ~70Mbit/sec  (iperf -s on SoC, iperf -c on destop PC)
  TCP TX ~56Mbit/sec  (iperf -s on destop PC, iperf -c o SoC)

The CPU load during the iperf test is around
1% user, 44% system, 4% irq, 48% softirq, with 7500 irqs/sec.

The kernel used in these measurements does not have iptables
support, I think packet filtering will slow it down noticably,
but I didn't actually try.  The ethernet driver uses NAPI,
but it doesn't seem to be a win judging from the irq/sec number.
The kernel was an ancient 2.6.20.

I tried hard, but I couldn't find any performance figures for
comparison.  (All performance figures I found refer to 1Gbit
or 10Gbit server type systems.)

What I'm interested in are some numbers for similar hardware,
to find out if my hardware and/or ethernet driver can be improved,
or if the CPU will always be the limiting factor.
I'd also be interested to know if hardware checksumming
support would improve throughput noticably in such a system,
or if it is only useful for 1Gbit and above.

Did anyone actually manage to get close to 100Mbit/sec
with similar CPU resources?


TIA,
Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: New fast(?)-boot results on ARM

2009-08-15 Thread Johannes Stezenbach
On Fri, Aug 14, 2009 at 10:43:05PM +0200, Robert Schwebel wrote:
 On Fri, Aug 14, 2009 at 10:04:57PM +0200, Denys Vlasenko wrote:
   r...@thebe:~$ microcom | ptx_ts U-Boot 2.0.0-rc9

Now that microcom is in Debian sid (thanks!), where can I find ptx_ts?
It seems to be quite useful.


   [  0.874559]   0.003967 Hit any key to stop autoboot:  0
 
  boot loader is not fast. considering its simple task, it can be made
  faster.
 
 Yup, will check. Almost 1 s seems really long.


I'm working on a SoC with a 200MHz ARM926EJ-S.  We managed to get
to 1.5sec from power-on to starting init. The main difference to
your platform seems to be that we use NOR flash.  The kernel is
not optimized, it still has some debug options turned on and
is used during development. (however, the 1.5sec is with quiet)
The root fs is cramfs. The kernel version is 2.6.20.

For u-boot we enabled the D-cache which gave a decent speed up
(on ARM926EJ-S this requires one to set up page tables and enable
MMU, but it's not that difficult). I don't have the numbers here
but I think it still takes ~300ms in u-boot, and ~1.2s for the kernel boot.


   [  1.326621]   0.452062 loaded zImage from /dev/nand0.kernel.bb with 
   size 1679656
   [  2.009996]   0.683375 Uncompressing 
   Linux...
done, booting the kernel.
   [  2.416999]   0.407003 Linux version 2.6.31-rc4-g056f82f-dirty 
   (s...@octopus) (gcc version 4.3.2 (OSELAS.Toolchain-1.99.3) ) #1 PREEMPT 
   Thu Aug 6 08:37:19 CEST 2009
  
  Other people already commented on this (kernel is too big)
 
 Not sure (the kernel is already customized for the board), but I'll take
 a look again.

We are booting an uncomressed kernel (~2.8MB).  Uncompressing (running the 
uncompressor
XIP in NOR flash) took ~0.5s longer than copying 2.8MB from flash to RAM.
BTW, we are using uImage and set verify=no in u-boot. We use u-boot-1.3.0.


   [  5.082616]   0.007992 RPC: Registered tcp transport module.
   [  5.605159]   0.522543 eth0: config: auto-negotiation on, 100FDX, 
   100HDX, 10FDX, 10HDX.

What is happening here? Waiting for eth link negotiation?

   [  6.602621]   0.997462 IP-Config: Complete:
   [  6.606638]   0.004017      device=eth0, addr=192.168.23.197, 
   mask=255.255.0.0, gw=192.168.23.2,
   [  6.614588]   0.007950      host=192.168.23.197, domain=, 
   nis-domain=(none),
   [  6.618652]   0.004064      bootserver=192.168.23.2, 
   rootserver=192.168.23.2, rootpath=
  
  Well, this ~1 second is not really kernel's fault, it's DHCP delay.
  But, do you need to do it at this moment?
  You do not seem to be using networking filesystems.
  You can run DHCP client in userspace.
 
 The board has ip autoconfig configured in, because we also use tftp/nfs
 boot for development. But it had been disabled on the commandline:
 
 ip=192.168.23.197:192.168.23.2:192.168.23.2:255.255.0.0:::
 
 That shouldn't do dhcp, right?

Try to boot with eth cable unplugged, see if it hangs in IP-config.
If it were doing static configuration it would be faster.

However, unless you need ethernet to boot (NFS root) I'd suggest
doing eth config in userspace.


   [  7.137924]   0.059316 starting udev
   [  7.147925]   0.010001 mounting tmpfs at /dev
   [  7.182299]   0.034374 creating static nodes
   [  7.410613]   0.228314 starting udevd...done
   [  8.811097]   1.400484 waiting for devices...done

And suddenly devtmpfs sounds like a good idea ;-)

We use static device nodes during boot, and later
setup busybox mdev for hotplug.


Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation for activating a deferred module init in the kernel

2008-06-18 Thread Johannes Stezenbach
On Wed, Jun 18, 2008, Stefan Richter wrote:
 Johannes Stezenbach wrote:
 I think the USB bus enumeration can take significant time:
 recognize a device is connected, turn on bus power, try
 to read descriptors (bus powered devices might be slow to
 respond after power up). And this will happen even with
 drivers_autoprobe == 0, right?

 Probably... I don't know which particular steps happen in the USB core  
 before upper layer drivers are bound.  Not binding the [eou]hci-hcd PCI  
 driver would certainly be more effective.

Well, in embedded systems you often don't have a PCI bus,
but platform devices. Maybe it's as simple as delaying
the USB platform_device_register() call, I don't know.

Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation for activating a deferred module init in the kernel

2008-06-17 Thread Johannes Stezenbach
On Wed, Jun 18, 2008 at 12:48:27AM +0200, Stefan Richter wrote:
 On Tue, 17 June 2008 12:55:31 -0700, Tim Bird wrote:
 On Tue, 17 Jun 2008 11:28:29 -0700, Tim Bird wrote:
 | One of the main sub-systems that we defer initialization of this
 | way is USB, and this saves quite a bit of time.  (Of course the
 | same, or slightly more CPU cycles are eventually used during
 | bootup time.  But this lets us get to user space quicker so we
 | can start user-visible applications faster.)

 What if you don't defer module initialization, but merely device probing?
...
 If you set /sys/bus/foo/drivers_autoprobe to 0 (default is 1), then a  
 /sys/bus/foo/drivers/bar will not be bound to devices.  You can trigger  
 driver--device binding later per device by writing a device's bus ID  
 into /sys/bus/foo/drivers/bar/bind, or by writing into  
 /sys/bus/foo/drivers_probe (I guess; I only used the per-device way so 
 far).

I think the USB bus enumeration can take significant time:
recognize a device is connected, turn on bus power, try
to read descriptors (bus powered devices might be slow to
respond after power up). And this will happen even with
drivers_autoprobe == 0, right?
OTOH I think just calling the module init function when no
devices are present on the bus doesn't need much time.

If you could delay the enumeration it would not be neccessary
to mess with drivers_autoprobe. However, I don't know enough
about USB so I don't know how to do it...


Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cross-compiling alternatives (was Re: [PATCH 0/1] Embedded Maintainer(s)...)

2008-06-13 Thread Johannes Stezenbach
On Fri, Jun 13, 2008, Tim Bird wrote:
 
 YMMV.  I put some of the resources and info I found at:
 http://elinux.org/Debugging_Makefiles

There is also remake, which is A patched GNU make with a debuger,
better tracing and error reporting (based on GNU make 3.80).
Development seems to have stopped, though.
http://sourceforge.net/projects/bashdb/

Johannes
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html