Op 22 apr. 2014, om 15:10 heeft Siarhei Siamashka <[email protected]> 
het volgende geschreven:

> On Thu, 3 Apr 2014 03:15:06 +0530
> Rajesh Mallah <[email protected]> wrote:
> 
>> On Thu, Mar 27, 2014 at 12:12 AM, Siarhei Siamashka <
>> [email protected]> wrote:
>> 
>>> On Tue, 25 Mar 2014 03:23:54 +0530
>>> Rajesh Mallah <[email protected]> wrote:
>>> 
>>>> I also observed that a clone of the rootfs from Mele M3 to another
>>>> A20 based TB Box consistently performed slower than Mele M3.
>>>> 
>>>> MeleM3 :    18.95 secs
>>>> Other A20:  25      secs
>>>> 
>>>> the dump from a10-meminfo-static were same in both the cases
>>>> except for dram_zq param
>>>> 
>>>> Can anyone pls explain why the difference in the A20 based boards
>>>> itself  ?
>>> 
>>> To profile this use case, we can run the following command:
>>> 
>>> $ DISPLAY=:0 perf record -e cpu-clock -a gtkperf -a
>>> 
>>> This instructed perf to collect statistics for the whole system
>>> from all CPU cores while gtkperf is running. Now after we have all
>>> the statistics collected, we can check the percentage of CPU usage
>>> for different processes:
>>> 
>>> $ perf report -s pid
>>> 
>>>    49.02%                gtkperf:19651
>>>    30.54%                   Xorg:19603
>>>    18.75%                swapper:    0
>>>     0.69%            kworker/0:1:19569
>>>     0.32%                xkbcomp:19656
>>>     0.31%                xkbcomp:19655
>>>     0.12%                   perf:19650
>>> 
>>> This means that some of the time the CPU cores were idle (swapper). The
>>> CPU usage in gtkperf is almost twice higher than in Xorg. You can also
>>> run 'perf report' to see the time spent in each individual function (if
>>> you have debugging symbols).
>>> 
>>> Now there is indeed one strange thing. If I run 'htop' while gtkperf
>>> is running, I can sometimes see that only one CPU core is fully loaded
>>> while the other is completely idle. And both gtkperf and Xorg processes
>>> are running on the same fully loaded CPU core.
>>> 
>>> As an experiment (on an Allwinner A20 based Cubietruck board), we can
>>> try pinning gtkperf and Xorg processes to CPU cores. Start Xorg and pin
>>> it to the CPU core 0:
>>> 
>>>    # taskset -c 0 Xorg
>>> 
>>> Then run gtkperf pinned to the same CPU 0 core as Xorg:
>>> 
>>>    $ DISPLAY=:0 taskset -c 0 gtkperf -a
>>> 
>>>    Total time: 26.78
>>> 
>>> And also pinned to a different CPU 1 core for comparison:
>>> 
>>>    $ DISPLAY=:0 taskset -c 1 gtkperf -a
>>> 
>>>    Total time: 19.44
>>> 
>>> When Xorg and gtkperf are running on different CPU cores, the
>>> performance is better. Without using taskset to pin processes to
>>> CPU cores, gtkperf result is somewhere between these 19.44 and
>>> 26.78 times, typically closer to the latter one.
>>> 
>>> It basically looks like the CFS scheduler in the linux-sunxi 3.4.79
>>> kernel is not doing a stellar job for gtkperf.
>>> 
>>> However a similar gtkperf behaviour can be also observed on
>>> ARM Chromebook (dual-core Cortex-A15 1.7GHz), when using exactly
>>> the same rootfs:
>>> 
>>> Total time:  9.35 (just run gtkperf without any tweaks)
>>> Total time:  9.82 (Xorg and gtkperf pinned to the same CPU core)
>>> Total time:  7.11 (Xorg and gtkperf pinned to different CPU cores)
> 
>> Dear Siamashka & List ,
>> 
>> executing Xorg and gtkperf in different cpus using taskset does makes a
>> difference.
>> It was possible to cut down from 25secs to 17secs. Now I am happy with my
>> new
>> toy (board)  :)
>> 
>> I will also take a closer look at your other suggestions of investigation.
> 
> BTW, forgot to mention that I also have tried the BFS scheduler
> just for fun. And pushed the patches to this branch:
>    https://github.com/ssvb/linux-sunxi/tree/sunxi-3.4.79-bfs
> 
> And it fixes at least this particular gtkperf issue. When the BFS
> patches are applied to sunxi-3.4, the reported gtkperf score is
> always good even without using explicit taskset.
> 
> So it indeed looks like either CFS in general, or CFS in linux 3.4, or
> CFS on ARM does not behave really well for this type of workload.
> I have also tried to run some gtkperf tests on an x86 laptop running
> linux 3.4 with xf86-video-fbdev driver, but could not reproduce a
> similar problem. However unlike what we observed on ARM, the client
> side gtkperf process does not hugely dominate in CPU usage anymore.
> On x86, the CPU usage is roughly evenly distributed between the gtkperf
> process and the Xorg process. This could be the reason why the same
> strange CFS scheduler behaviour is not triggered there.
> 
> About the BFS scheduler in general. It might be that its popularity
> among the low latency responsive linux desktop fans is actually well
> deserved :-)
> 
> I would even propose to apply the BFS patches to the sunxi-3.4 kernel
> (ensuring that we have a superior X11 desktop performance is always
> nice), but read somewhere that BFS does not play very well with
> lennartware. Are there many systemd users here?

*raises hand*

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to