On Tue, 17 Dec 2013 10:15:56 -0800 (PST)
Димитър Гамишев <[email protected]> wrote:

> On Monday, December 16, 2013 6:44:03 AM UTC+2, Siarhei Siamashka wrote:
> > There is just one thing I'm really worried about. The 16-bit 
> > memory interface is a major performance risk factor. I wonder 
> > how LIME performs on memory intensive workloads (such as 
> > graphics) when compared with, for example, Cubieboard. 

Now that I got A10-OLinuXino-Lime device (thanks Tsvetan!), I could
run some OpenGL ES benchmarks in X11 with Mali r3p0 binary drivers.
Mele A2000 and Cubieboard1 devices are used for comparison because
they have 32-bit memory interface, but different memory clock speed.
The default memory timings configuration for LIME is using dram_cas=9
set in dram_a10_olinuxino_l.c in u-boot, but I also tried the
cubieboard memory timings (dram_cas=6) as an extra test just to
see how it may affect performance.

== The final score for glmark2-es2 2012.12 (test in 800x600 window) ==

LIME (CAS=9) - 480MHz dram clock, dram_bus_width=16, dram_cas=9
LIME (CAS=6) - 480MHz dram clock, dram_bus_width=16, dram_cas=6
Mele A2000   - 360MHz dram clock, dram_bus_width=32, dram_cas=6
Cubieboard1  - 480MHz dram clock, dram_bus_width=32, dram_cas=6

In all cases ARM Cortex-A8 in Allwinner A10 is clocked at 1008MHz
(performance cpufreq governor) and Mali400 MP1 is clocked at 320MHz.
Desktop color depth is 32bpp.

             | 1280x720p50 | 1280x720p60 | 1920x1080p50 | 1920x1080p60
-------------+-------------+-------------+--------------+--------------
LIME (CAS=9) |      85     |      75     |     46 (**)  |    41 (**)
LIME (CAS=6) |     100     |      91     |     56 (**)  |    48 (**)
Mele A2000   |     151     |     148     |    140 (**)  |   136 (**)
Cubieboard1  |     166     |     166     |    161 (*)   |   157 (*)
-------------+-------------+-------------+--------------+--------------

 (*) minor occasional glitches on screen
(**) severe screen shaking effect is observed

Note that the window size is the exactly same in all tests. Only the
screen resolution is different, and this only affects how much of the
memory bandwidth is drained by maintaining the screen refresh.

With 16-bit memory bus width, the 3D graphics performance becomes very
bad very quickly when the screen resolution and refresh rate increase.
Trying to use the 50Hz monitor refresh rate is more important than
ever, because it both increases the performance and also rendering
perfect tear-free 50Hz animation is somewhat less demanding than
60Hz animation.

The performance of hardware accelerated video decoding using CedarX
with 1080p monitor is going to be really interesting too. And common
sense dictates that it is very important not to waste memory bandwidth
unnecessarily.

BTW, for 32-bit memory bus width, Mali performance does not seem to
be affected that much by the screen resolution and refresh rate
increase. But software graphics rendering done on the CPU (or any
other memory intensive activity) is still taking a performance hit
even with the 32-bit memory bus:
    
http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html
The X11 desktop performance on LIME is going to be challenging
at high screen resolutions too, unless the desktop color depth
is reduced to 16bpp.

The memory timings with dram_cas=9 also affect performance.
While dram_cas=6 might be considered as an unsafe choice for
480MHz, it would be really great if we could use some better
safe/fast settings.

I was not going to sugar-coat anything here. And I understand that
using the 16-bit memory bus width was a side effect of extreme cost
reduction. The primary LIME competitors are likely not high end
ARM devices, but Raspberry Pi and low cost microcontrollers.
The whole point of my e-mail is just that clearly outperforming
them may need some tuning on the software side for better
utilization of the memory bandwidth that is available.

A verification of benchmark results is always welcome. Also for
the people not having a LIME board yet, just taking the u-boot
dram settings from LIME and using them on Cubieboard1 appears
to result in exactly the same memory performance. So Cubieboard1
hardware can be used to simulate the 16-bit memory bus
performance too.

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to