Op 22 apr. 2014, om 15:10 heeft Siarhei Siamashka <[email protected]> het volgende geschreven:
> On Thu, 3 Apr 2014 03:15:06 +0530 > Rajesh Mallah <[email protected]> wrote: > >> On Thu, Mar 27, 2014 at 12:12 AM, Siarhei Siamashka < >> [email protected]> wrote: >> >>> On Tue, 25 Mar 2014 03:23:54 +0530 >>> Rajesh Mallah <[email protected]> wrote: >>> >>>> I also observed that a clone of the rootfs from Mele M3 to another >>>> A20 based TB Box consistently performed slower than Mele M3. >>>> >>>> MeleM3 : 18.95 secs >>>> Other A20: 25 secs >>>> >>>> the dump from a10-meminfo-static were same in both the cases >>>> except for dram_zq param >>>> >>>> Can anyone pls explain why the difference in the A20 based boards >>>> itself ? >>> >>> To profile this use case, we can run the following command: >>> >>> $ DISPLAY=:0 perf record -e cpu-clock -a gtkperf -a >>> >>> This instructed perf to collect statistics for the whole system >>> from all CPU cores while gtkperf is running. Now after we have all >>> the statistics collected, we can check the percentage of CPU usage >>> for different processes: >>> >>> $ perf report -s pid >>> >>> 49.02% gtkperf:19651 >>> 30.54% Xorg:19603 >>> 18.75% swapper: 0 >>> 0.69% kworker/0:1:19569 >>> 0.32% xkbcomp:19656 >>> 0.31% xkbcomp:19655 >>> 0.12% perf:19650 >>> >>> This means that some of the time the CPU cores were idle (swapper). The >>> CPU usage in gtkperf is almost twice higher than in Xorg. You can also >>> run 'perf report' to see the time spent in each individual function (if >>> you have debugging symbols). >>> >>> Now there is indeed one strange thing. If I run 'htop' while gtkperf >>> is running, I can sometimes see that only one CPU core is fully loaded >>> while the other is completely idle. And both gtkperf and Xorg processes >>> are running on the same fully loaded CPU core. >>> >>> As an experiment (on an Allwinner A20 based Cubietruck board), we can >>> try pinning gtkperf and Xorg processes to CPU cores. Start Xorg and pin >>> it to the CPU core 0: >>> >>> # taskset -c 0 Xorg >>> >>> Then run gtkperf pinned to the same CPU 0 core as Xorg: >>> >>> $ DISPLAY=:0 taskset -c 0 gtkperf -a >>> >>> Total time: 26.78 >>> >>> And also pinned to a different CPU 1 core for comparison: >>> >>> $ DISPLAY=:0 taskset -c 1 gtkperf -a >>> >>> Total time: 19.44 >>> >>> When Xorg and gtkperf are running on different CPU cores, the >>> performance is better. Without using taskset to pin processes to >>> CPU cores, gtkperf result is somewhere between these 19.44 and >>> 26.78 times, typically closer to the latter one. >>> >>> It basically looks like the CFS scheduler in the linux-sunxi 3.4.79 >>> kernel is not doing a stellar job for gtkperf. >>> >>> However a similar gtkperf behaviour can be also observed on >>> ARM Chromebook (dual-core Cortex-A15 1.7GHz), when using exactly >>> the same rootfs: >>> >>> Total time: 9.35 (just run gtkperf without any tweaks) >>> Total time: 9.82 (Xorg and gtkperf pinned to the same CPU core) >>> Total time: 7.11 (Xorg and gtkperf pinned to different CPU cores) > >> Dear Siamashka & List , >> >> executing Xorg and gtkperf in different cpus using taskset does makes a >> difference. >> It was possible to cut down from 25secs to 17secs. Now I am happy with my >> new >> toy (board) :) >> >> I will also take a closer look at your other suggestions of investigation. > > BTW, forgot to mention that I also have tried the BFS scheduler > just for fun. And pushed the patches to this branch: > https://github.com/ssvb/linux-sunxi/tree/sunxi-3.4.79-bfs > > And it fixes at least this particular gtkperf issue. When the BFS > patches are applied to sunxi-3.4, the reported gtkperf score is > always good even without using explicit taskset. > > So it indeed looks like either CFS in general, or CFS in linux 3.4, or > CFS on ARM does not behave really well for this type of workload. > I have also tried to run some gtkperf tests on an x86 laptop running > linux 3.4 with xf86-video-fbdev driver, but could not reproduce a > similar problem. However unlike what we observed on ARM, the client > side gtkperf process does not hugely dominate in CPU usage anymore. > On x86, the CPU usage is roughly evenly distributed between the gtkperf > process and the Xorg process. This could be the reason why the same > strange CFS scheduler behaviour is not triggered there. > > About the BFS scheduler in general. It might be that its popularity > among the low latency responsive linux desktop fans is actually well > deserved :-) > > I would even propose to apply the BFS patches to the sunxi-3.4 kernel > (ensuring that we have a superior X11 desktop performance is always > nice), but read somewhere that BFS does not play very well with > lennartware. Are there many systemd users here? *raises hand* -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
