On Thu, 3 Apr 2014 03:15:06 +0530
Rajesh Mallah <[email protected]> wrote:

> On Thu, Mar 27, 2014 at 12:12 AM, Siarhei Siamashka <
> [email protected]> wrote:
> 
> > On Tue, 25 Mar 2014 03:23:54 +0530
> > Rajesh Mallah <[email protected]> wrote:
> >
> > > I also observed that a clone of the rootfs from Mele M3 to another
> > > A20 based TB Box consistently performed slower than Mele M3.
> > >
> > > MeleM3 :    18.95 secs
> > > Other A20:  25      secs
> > >
> > > the dump from a10-meminfo-static were same in both the cases
> > > except for dram_zq param
> > >
> > > Can anyone pls explain why the difference in the A20 based boards
> > > itself  ?
> >
> > To profile this use case, we can run the following command:
> >
> > $ DISPLAY=:0 perf record -e cpu-clock -a gtkperf -a
> >
> > This instructed perf to collect statistics for the whole system
> > from all CPU cores while gtkperf is running. Now after we have all
> > the statistics collected, we can check the percentage of CPU usage
> > for different processes:
> >
> > $ perf report -s pid
> >
> >     49.02%                gtkperf:19651
> >     30.54%                   Xorg:19603
> >     18.75%                swapper:    0
> >      0.69%            kworker/0:1:19569
> >      0.32%                xkbcomp:19656
> >      0.31%                xkbcomp:19655
> >      0.12%                   perf:19650
> >
> > This means that some of the time the CPU cores were idle (swapper). The
> > CPU usage in gtkperf is almost twice higher than in Xorg. You can also
> > run 'perf report' to see the time spent in each individual function (if
> > you have debugging symbols).
> >
> > Now there is indeed one strange thing. If I run 'htop' while gtkperf
> > is running, I can sometimes see that only one CPU core is fully loaded
> > while the other is completely idle. And both gtkperf and Xorg processes
> > are running on the same fully loaded CPU core.
> >
> > As an experiment (on an Allwinner A20 based Cubietruck board), we can
> > try pinning gtkperf and Xorg processes to CPU cores. Start Xorg and pin
> > it to the CPU core 0:
> >
> >     # taskset -c 0 Xorg
> >
> > Then run gtkperf pinned to the same CPU 0 core as Xorg:
> >
> >     $ DISPLAY=:0 taskset -c 0 gtkperf -a
> >
> >     Total time: 26.78
> >
> > And also pinned to a different CPU 1 core for comparison:
> >
> >     $ DISPLAY=:0 taskset -c 1 gtkperf -a
> >
> >     Total time: 19.44
> >
> > When Xorg and gtkperf are running on different CPU cores, the
> > performance is better. Without using taskset to pin processes to
> > CPU cores, gtkperf result is somewhere between these 19.44 and
> > 26.78 times, typically closer to the latter one.
> >
> > It basically looks like the CFS scheduler in the linux-sunxi 3.4.79
> > kernel is not doing a stellar job for gtkperf.
> >
> > However a similar gtkperf behaviour can be also observed on
> > ARM Chromebook (dual-core Cortex-A15 1.7GHz), when using exactly
> > the same rootfs:
> >
> > Total time:  9.35 (just run gtkperf without any tweaks)
> > Total time:  9.82 (Xorg and gtkperf pinned to the same CPU core)
> > Total time:  7.11 (Xorg and gtkperf pinned to different CPU cores)

> Dear Siamashka & List ,
> 
> executing Xorg and gtkperf in different cpus using taskset does makes a
> difference.
> It was possible to cut down from 25secs to 17secs. Now I am happy with my
> new
> toy (board)  :)
> 
> I will also take a closer look at your other suggestions of investigation.

BTW, forgot to mention that I also have tried the BFS scheduler
just for fun. And pushed the patches to this branch:
    https://github.com/ssvb/linux-sunxi/tree/sunxi-3.4.79-bfs

And it fixes at least this particular gtkperf issue. When the BFS
patches are applied to sunxi-3.4, the reported gtkperf score is
always good even without using explicit taskset.

So it indeed looks like either CFS in general, or CFS in linux 3.4, or
CFS on ARM does not behave really well for this type of workload.
I have also tried to run some gtkperf tests on an x86 laptop running
linux 3.4 with xf86-video-fbdev driver, but could not reproduce a
similar problem. However unlike what we observed on ARM, the client
side gtkperf process does not hugely dominate in CPU usage anymore.
On x86, the CPU usage is roughly evenly distributed between the gtkperf
process and the Xorg process. This could be the reason why the same
strange CFS scheduler behaviour is not triggered there.

About the BFS scheduler in general. It might be that its popularity
among the low latency responsive linux desktop fans is actually well
deserved :-)

I would even propose to apply the BFS patches to the sunxi-3.4 kernel
(ensuring that we have a superior X11 desktop performance is always
nice), but read somewhere that BFS does not play very well with
lennartware. Are there many systemd users here?

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to