On Fri, 28 Mar 2014 14:21:33 +0100
Jens Kuske <jensku...@gmail.com> wrote:

> > On 03/28/2014 11:42 AM, Siarhei Siamashka wrote:
> >>      https://github.com/ssvb/lima-memtester
> >>
> >> Basically, that's just a single static binary with no dependencies.
> >> It combines a memtester tool with a simple spinning textured cube
> >> demo from the work-in-progress free open source Mali400 driver
> >> project http://limadriver.org/
> 
> Nice tool, I already used memtester but didn't bother to load the GPU.
> 
> >> It would be 100% free software using only free software tools if the
> >> open source lima shader compiler could handle vertex shaders. Right now
> >> only the fragment shader binary had been generated using the open
> >> source shader compiler. But the vertex shader binary (injected as an
> >> array into the source code) still used the output of the proprietary
> >> shader compiler from the libMali.so blob.
> >>
> >> Anyway, my Cubietruck passes the test at 456MHz dram clock speed and
> >> fails at 480MHz. And my Cubieboard2 passes it at 504MHz but fails
> >> at 528MHz. The second patch from Jens Kuske unfortunately does not
> >> seem to have any visible effect here and does not change anything
> >> for me.

BTW, I did not mean that the tRFC patch is bad or anything. Just that
no impact is observable on my hardware.

Now I also have some data ready for the dcdc3 voltage tweaks as
discussed on IRC a few days ago:
    http://irclog.whitequark.org/linux-sunxi/2014-04-01#6986335;

Your observation about the dcdc3 voltage being set wrong was really
spot on! Kudos for the multimeter based debugging :-)

And dcdc3 really affects the maximum stable memory clock speed ("stable"
in terms of passing the lima-memtester tests). Though changing it to
1.3v (as needed for MBUS) was not really enough to get clearly distinct
results, so I pushed it even further to 1.35v. The A20 datasheet
does not seem to be providing the valid voltage range for VDD-DLL:
    
https://github.com/OLIMEX/OLINUXINO/blob/master/HARDWARE/A20-PDFs/A20%20Datasheet%20v1.0%2020130227.pdf?raw=true
But I optimistically assumed that up to 1.4v might be fine.

Anyway, with dcdc3 voltage set to 1.35v, I could run lima-memtester
successfully for many hours on my Cubieboard2 with dram clock set
to 528MHz. That's at least 24MHz more than was previously possible
with dcdc3 set to 1.25v

The Cubietruck is a bit different story. In general, when running
lima-memtester, the following outcomes are possible:
  a) it runs infinitely or until terminated by the user (success)
  b) the device deadlocks (an obvious fail)
  c) the memtester log starts showing errors (a fail too)

With dcdc3 originally set to 1.25v on my Cubietruck, lima-memtester
fails pretty fast (typically in less than 15 minutes) and most of the
failures are the device deadlocks. With dcdc3 increased to 1.35v,
lima-memtester still fails, but takes much longer and the failures are
reported as memtester errors in the log. Again, testing both with and
without the tRFC patch in u-boot does not seem to change anything.

I have the following preliminary theory. It looks like the deadlocks
and memtester log errors are the symptoms of two (or more?) distinct
problems.

The deadlocks seem to be caused by insufficient dcdc3 voltage, and some
percentage of A20 chips may be really sensitive to low dcdc3. I wonder
if that's the primary cause of the 480MHz dram clock stability problems
on some small percentage of Cubieboard2 devices?
    http://irclog.whitequark.org/linux-sunxi/2013-07-29#4520613;

And the regular memtester errors with 1.35v dcdc3 are probably
indicating that the traces to DDR3 are not so good on the Cubietruck
PCB. Or the timings are too tight for one of the unlucky DDR3 chips in
my Cubietruck. Either way, this is probably not dcdc3 voltage related.
And not tRFC related either.

I'll keep running tests and will provide an update if something new
gets discovered.

> Looks like my cubietruck is a bad test device, it successfully run
> memtester at 504MHz for 24h and also lima-memtester runs good till now
> (two loops finished ok).

You have an unusual definition of "bad" ;-)

I'm not very happy having Cubietruck with slow dram, because being lower
than the sunxi-typical 480MHz dram speed may affect the credibility
of benchmark results. But using Cubietruck instead of Cubieboard2 is
important when testing for bus address calculation bugs on systems
with 2GiB of RAM.

I myself would prefer to have it the other way around. A Cubieboard,
which can reliably clock dram at 480MHz. And a Cubieboard2, which can't.

> I didn't want to stir up too much hope for faster memory, only wanted to
> mention the possibility.
> There are many other dram timing parameters that depend on things like
> clock speed but don't get calculated anywhere. They must be somewhere in
> .tpr[0-2] and therefore fixed at some (hopefully big enough) value.
> The tRFC was fixed for 400MHz, so with some bad luck the other
> parameters are also dimensioned for 400MHz.

There are generally two sets of these magical .tpr settings. One is
typical for the dram settings with .cas=6:
    
https://github.com/linux-sunxi/u-boot-sunxi/tree/96510e1eeae4/board/sunxi/dram_cubieboard.c
And another is typical for dram settings with .cas=9:
    
https://github.com/linux-sunxi/u-boot-sunxi/tree/96510e1eeae4/board/sunxi/dram_cubieboard2.c

If we look at the "Standard Speed Bins" from the DDR3 specification,
then we can see that cas=6 is typical for the slowest flavour of
DDR3-800 (which translates to 400MHz memory clock speed) and cas=9 is
typical for the slowest DDR3-1333 (which translates to 667MHz clock
speed) listed in these bins.

I would guess that the .tpr settings, bundled with .cas=6, are likely
assuming 400MHz memory clock speed. Which also agrees with the tRFC
hardcoded values as you mentioned above. And also with the A10
and A20 user manuals, which specify 0~400MHz range for SDRAM_clk.

The .cas=9 style sets of timings are likely very conservative and
assuming memory clock speeds up to 667MHz if we take the cas value
as a hint. If we are running dram at around 480MHz, these settings
may unnecessarily sacrifice some performance.

We just seem to be selecting between these two sets without looking too
much into it. For example, the A10-OLinuXino-Lime memory settings got
changed from the .cas=9 set to the .cas=6 set some time ago:
    https://github.com/linux-sunxi/u-boot-sunxi/commit/eccc92de2d13
Which is good for performance and helps to reduce the impact of
having only a half width 16-bit memory interface:
    https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg00960.html
However having the dram configuration specifically tailored for the
480-533MHz range would be the best. Something interpolated between
the .cas=6 and .cas=9 sets of parameters?

Now here somebody may point out that running dram at 480MHz may
be not a very good idea in general, because it is outside of the
specified 0~400MHz range for A10 and A20. But if we check the A13
manual, then we can see "Support DDR2 SDRAM and DDR3 SDRAM up to 533MHz"
there. It looks like Allwinner is rounding the maximum supported DRAM
clock speed down to one of the standard values (400MHz, 533MHz,
667MHz) without bothering about anything in between.

> > But all hope is not yet lost, maybe on badly designed boards 
> > (tablets/mele) it does work better with the right timings.
> 
> The refresh timings aren't influenced much by board characteristics as
> far as I know, it's a DRAM chip internal thing. It could help to stay
> stable at higher temperature or for bad quality DRAM chips.

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to