Hi, I think that the problem is caused by ehci driver instead of ARM SMP scheduler, and I have verified that patch below can fix the problem. And the patch has been posted on usb/arm/omap mail list for discussion.
But I am wondering that why no such problem after passing 'nosmp' to kernel. >From d85a08714ed23ec8688013b464dc90c6386db0d8 Mon Sep 17 00:00:00 2001 From: Ming Lei <[email protected]> Date: Sat, 27 Aug 2011 22:29:15 +0800 Subject: [PATCH] usb: ehci: fix update qtd->token in qh_append_tds This patch fixs one performance bug on ARM Cortex A9 dual core platform, which has been reported on quite a few ARM machines(OMAP4, Tegra 2, snowball...), see details from link of https://bugs.launchpad.net/bugs/709245. In fact, one mb() on ARM is enough to flush L2 cache, but 'dummy->hw_token = token;' after mb() is added just for obeying correct mb() usage. The patch has been tested ok on OMAP4 panda A1 board, the performance of 'dd' over usb mass storage can be increased from 4~5MB/sec to 14~16MB/sec after applying this patch. Signed-off-by: Ming Lei <[email protected]> --- drivers/usb/host/ehci-q.c | 14 ++++++++++++++ 1 files changed, 14 insertions(+), 0 deletions(-) diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c index 0917e3a..65b5021 100644 --- a/drivers/usb/host/ehci-q.c +++ b/drivers/usb/host/ehci-q.c @@ -1082,6 +1082,20 @@ static struct ehci_qh *qh_append_tds ( wmb (); dummy->hw_token = token; + /* The mb() below is added to make sure that + * 'token' can be writen into qtd, so that ehci + * HC can see the up-to-date qtd descriptor. On + * some archs(at least on ARM Cortex A9 dual core), + * writing into coherenet memory doesn't mean the + * value written can reach physical memory + * immediately, and the value may be buffered + * inside L2 cache. 'dummy->hw_token = token;' + * after mb() is added for obeying correct mb() + * usage. + * */ + mb(); + token = dummy->hw_token; + urb->hcpriv = qh_get (qh); } } -- 1.7.4.1 -- You received this bug notification because you are a member of Linaro Release Team, which is subscribed to the bug report. https://bugs.launchpad.net/bugs/709245 Title: ARM SMP scheduler performance bug Status in Linaro Ubuntu Evaluation Builds: Confirmed Status in Linaro Linux: Confirmed Status in OEM Priority Project: New Status in “linux-ti-omap4” package in Ubuntu: Confirmed Status in “linux-ti-omap4” source package in Maverick: Confirmed Status in “linux-ti-omap4” source package in Natty: Confirmed Status in “linux-ti-omap4” source package in Oneiric: Confirmed Bug description: Original Bug name: "panda: USB disk IO slow" This bug effects ARM Cortex A9 cores, snowball, nvidia, OMAP 4, and other Cortex A9 processors. Problem is in Fedora ARM builds also so not limited to Ubuntu. My Panda's USB seems to be significantly slower than a Beagle C4. hdparm shows buffered reads as ~12MB/s on the Panda, and about ~20-25MB/s on a Beagle C4 from the same external Lacie USB disk. Kernel is 2.6.37-1002-linaro-omap Disk shows as: [ 5.170440] scsi 0:0:0:0: Direct-Access LaCie d2 quadra PQ: 0 ANSI: 4 [ 5.172546] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 5.175415] sd 0:0:0:0: [sda] 976773168 512-byte logical blocks: (500 GB/465 GiB) The board is otherwise idle during the test. Doing perf_2.6.37-12 record -a dd if=/dev/sda of=/dev/null bs=4096 count=100000 shows : 81.41% swapper [kernel.kallsyms] [k] default_idle 6.33% dd [kernel.kallsyms] [k] __copy_to_user 0.94% swapper [kernel.kallsyms] [k] cpu_idle 0.51% dd [kernel.kallsyms] [k] __make_request 0.51% perf_2.6.37-12 [kernel.kallsyms] [k] __copy_from_user which suggests it's not CPU constrained. Dave To manage notifications about this bug go to: https://bugs.launchpad.net/linaro-ubuntu/+bug/709245/+subscriptions _______________________________________________ Mailing list: https://launchpad.net/~linaro-release Post to : [email protected] Unsubscribe : https://launchpad.net/~linaro-release More help : https://help.launchpad.net/ListHelp

