On Tue, 25 Mar 2014 03:23:54 +0530
Rajesh Mallah <mallah.raj...@gmail.com> wrote:

> I also observed that a clone of the rootfs from Mele M3 to another
> A20 based TB Box consistently performed slower than Mele M3.
> 
> MeleM3 :    18.95 secs
> Other A20:  25      secs
> 
> the dump from a10-meminfo-static were same in both the cases
> except for dram_zq param
> 
> Can anyone pls explain why the difference in the A20 based boards
> itself  ?

To profile this use case, we can run the following command:

$ DISPLAY=:0 perf record -e cpu-clock -a gtkperf -a

This instructed perf to collect statistics for the whole system
from all CPU cores while gtkperf is running. Now after we have all
the statistics collected, we can check the percentage of CPU usage
for different processes:

$ perf report -s pid

    49.02%                gtkperf:19651
    30.54%                   Xorg:19603
    18.75%                swapper:    0
     0.69%            kworker/0:1:19569
     0.32%                xkbcomp:19656
     0.31%                xkbcomp:19655
     0.12%                   perf:19650

This means that some of the time the CPU cores were idle (swapper). The
CPU usage in gtkperf is almost twice higher than in Xorg. You can also
run 'perf report' to see the time spent in each individual function (if
you have debugging symbols).

Now there is indeed one strange thing. If I run 'htop' while gtkperf
is running, I can sometimes see that only one CPU core is fully loaded
while the other is completely idle. And both gtkperf and Xorg processes
are running on the same fully loaded CPU core.

As an experiment (on an Allwinner A20 based Cubietruck board), we can
try pinning gtkperf and Xorg processes to CPU cores. Start Xorg and pin
it to the CPU core 0:

    # taskset -c 0 Xorg

Then run gtkperf pinned to the same CPU 0 core as Xorg:

    $ DISPLAY=:0 taskset -c 0 gtkperf -a

    Total time: 26.78

And also pinned to a different CPU 1 core for comparison:

    $ DISPLAY=:0 taskset -c 1 gtkperf -a

    Total time: 19.44

When Xorg and gtkperf are running on different CPU cores, the
performance is better. Without using taskset to pin processes to
CPU cores, gtkperf result is somewhere between these 19.44 and
26.78 times, typically closer to the latter one.

It basically looks like the CFS scheduler in the linux-sunxi 3.4.79
kernel is not doing a stellar job for gtkperf.

However a similar gtkperf behaviour can be also observed on
ARM Chromebook (dual-core Cortex-A15 1.7GHz), when using exactly
the same rootfs:

Total time:  9.35 (just run gtkperf without any tweaks)
Total time:  9.82 (Xorg and gtkperf pinned to the same CPU core)
Total time:  7.11 (Xorg and gtkperf pinned to different CPU cores)

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to