On Sun, 25 May 2014 11:43:52 +0200
Hans de Goede <[email protected]> wrote:
> Hi,
>
> On 05/25/2014 12:34 AM, Olliver Schinagl wrote:
> > Hey all,
> >
> > I am just venting here mostly as I don't have much time to test things
> > really just yet.
> >
> > As you know I am writing a book based on the A10/A20 stuff and wanted
> > to double-check/upgrade the Fedora 19 chapter to F20. After an hour of
> > figuring out why it kept crashing (I thought it was a power thing) and
> > trying various u-boots (supplied in the f20 image and from our current
> > git) I suddenly remembered that a FAST_MBUS setting does not work on
> > my cubietruck. And indeed, removing FAST_MBUS from the boards.cfg made
> > the board sprung to life.
> >
> > TL:DR;
> > We should really drop FAST_MBUS on all boards as a default, since we
> > want the repo to contain safe defaults. Once we get the whole memory
> > tweaking done more reliable, then sure, but until then, only users who
> > can and want to play with this setting should.
> >
> > Also, the F20 image should really be rebuilt with u-boots without FAST_MBUS.
> > Hans, I'm not sure if you know or could talk to the current fedora
> > maintainer for the Allwinner stuff, but I do feel it is somewhat
> > important that at least the image works for everyone 'out of the box',
> > right? So maybe publish an updated or r2 image?
>
> I've suggested a couple of times already to run of FAST_MBUS, but
> others did not like the idea.
>
> And TBH they have a point, your board is the only one which has
> stability issues which seem to be caused by FAST_MBUS.
>
> Can you please work with Siarhei Siamashka to properly root
> cause this ? If it really is FAST_MBUS I'm all for disabling it,'
> but first lets make sure.
A status update. Olliver was kind enough to spend several hours of
his time running various DRAM tests a few days ago:
http://irclog.whitequark.org/linux-sunxi/2014-05-26#9139352;
These tests seem to have confirmed my earlier suspicions about
the unreliable hardware DQS gate training being at fault. On my
Cubietruck, the results of performing the DQS gate training
are the following, no matter whether FAST_MBUS is used or not:
rslr0=00000249, rdgr0=000000AA
Or we can represent them as a bit more readable artificial
'dqs_gating_delay' variable, where each byte specifies the
DQS gating delay (measured in quarter cycles) for each of
the four DDR3 byte lanes (or in other words, the delay for
each of the 4 memory chips used in the Cubietruck):
dqs_gating_delay=0x06060606
But Olliver got the following results on his Cubietruck:
dqs_gating_delay=0x05060606 (without FAST_MBUS)
dqs_gating_delay=0x05060605 (with FAST_MBUS)
This basically means that enabling FAST_MBUS indeed behaves very
much like the http://en.wikipedia.org/wiki/Butterfly_effect
The hardware DQS training apparently has difficulties to decide
between 5 and 6 quarter cycle delay for one of the lanes, and
the FAST_MBUS configuration makes it flip to 5 (making the system
so unreliable, that it even fails to boot). We would assume that
in the case of doubt, both of these settings should be more or
less equally good or bad, but apparently the hardware DQS gate
training also has a bias towards the lower values (probably
caused by rounding down instead of rounding to the nearest).
If anybody is interested in more details about how this all
works, it is possible to find the relevant information in the
"data training" section of the RK30XX manual.
In any case, Olliver tried to override this unreliable autodetection
and use the rslr0/rdgr0 settings obtained from my board. This seemed
to have resolved the reliability problems.
Now about the next steps. On IRC I kept asking Olliver two days
in a row to check if his Cubietruck can work with a higher DRAM
clock frequency than the lowly default 432MHz. If his board is
anything like mine or Jens Kuske's, then it should be bootable
with DRAM clocked at 600MHz, and pass lima-memtester tests at
around 528MHz or 540MHz (with dram_tpr3=0x7222 and dcdc3_vol=1300).
Knowing the DRAM clock speed limit on the Oliver's board and
comparing it with the other boards would allow us to know if it
is just the hardware DQS training block broken and everything
else is the same, or something else is also different. We are
still missing this information.
As for the u-boot DRAM code, we would need to implement some sort
of a workaround or perform the DQS gate training in software. This
is not something new, because the hardware DDR3 calibration seems
to be broken in pretty much almost every SoC around (if we check
the errata lists and patches floating around) :-)
Preferably all of this has to be resolved and tested on multiple
devices before the next u-boot merge window opens. We are really
dependent on the users' feedback and cooperation because every
device may potentially have its own individual quirks. And I
prefer to have predictable hardware behaviour, reasonable
explanations for everything and no unresolved mysteries.
Thanks.
--
Best regards,
Siarhei Siamashka
--
You received this message because you are subscribed to the Google Groups
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.