Just an idea: Can the convince the most popular distros to add an automatic
stability test which sends the data to a server? It could be advertised by
e.g. the mod banner. I think thatn then we could become enough data to set
this right.
What a pitty that AllWinner can't provide us with safe numbers...


2014-05-13 0:47 GMT+02:00 Siarhei Siamashka <siarhei.siamas...@gmail.com>:

> On Sun, 11 May 2014 22:58:31 +0200
> Hans de Goede <hdego...@redhat.com> wrote:
>
> > Hi,
> >
> > On 05/11/2014 10:43 PM, Hans de Goede wrote:
> > > Hi,
> > >
> > > On 05/11/2014 11:53 AM, Siarhei Siamashka wrote:
> > >> It has been confirmed that a substantial percentage of cubieboard2
> > >> and cubietruck users are having stability issues. These issues are
> > >> caused by having various voltages optimistically configured way too
> > >> low. Because each unit has its own tolerances, not everyone can
> > >> easily reproduce these problems.
> > >>
> > >> To address the issue, we take the updated settings from:
> > >>
> https://github.com/cubieboard/cubie_configs/tree/09e511721697/sysconfig/linux
> > >> This repository is relevant because it is referenced from:
> > >>
> http://docs.cubieboard.org/tutorials/ct1/development/compiling_latest_kernel_for_cubietruck_cubieboard3
> > >> Please note that there is also a sunxi-boards repository fork at
> > >> https://github.com/cubieboard (with bad settings) and this makes
> > >> things more confusing than necessary.
> > >>
> > >> Signed-off-by: Siarhei Siamashka <siarhei.siamas...@gmail.com>
> > >> ---
> > >>  sys_config/a20/cubieboard2.fex | 18 +++++++++---------
> > >>  sys_config/a20/cubietruck.fex  | 16 ++++++++--------
> > >>  2 files changed, 17 insertions(+), 17 deletions(-)
> > >>
> > >> diff --git a/sys_config/a20/cubieboard2.fex
> b/sys_config/a20/cubieboard2.fex
> > >> index 1436df8..c8c9c74 100644
> > >> --- a/sys_config/a20/cubieboard2.fex
> > >> +++ b/sys_config/a20/cubieboard2.fex
> > >> @@ -1,14 +1,14 @@
> > >>  [product]
> > >>  version = "100"
> > >> -machine = "cubieboard"
> > >> +machine = "cubieboard2"
> > >>
> > >>  [platform]
> > >>  eraseflag = 0
> > >>
> > >>  [target]
> > >>  boot_clock = 912
> > >> -dcdc2_vol = 1400
> > >> -dcdc3_vol = 1250
> > >> +dcdc2_vol = 1450
> > >
> > > This makes no sense, since in the dvfs table there is:
> > > max_freq = 912000000
> > > LV2_freq = 912000000
> > > LV2_volt = 1425
>
> My understanding is that the dcdc2_vol voltage value is in use in
> the case if cpufreq is disabled and when the dvfs part of fex has
> no effect. So I think that it tries to play defensively and
> safeguard against the bootloader setting something like:
>  LV1_freq = 1008000000
>  LV1_volt = 1450
>
> Sure, in u-boot-sunxi we have the following code, which is hardcoding
> the CPU clock frequency to 912MHz:
>
> #ifdef CONFIG_SUN7I
>                 clock_set_pll1(912000000);
> #else
>                 clock_set_pll1(1008000000);
> #endif
>
> However this fex and u-boot interaction and implicit dependency
> is quite nasty. It has already bitten us in the butt with the
> u-boot setting dcdc3 to 1.3V and fex reverting it back to 1.25V
>
> Anyway, what do you suggest?
>
> > > S0 1.425 volt would make a lot more sense. Also Page 31
> > > of A20+Datasheet+v1.0+20130227.pdf says that the MAX
> > > CPU and systemvoltage is 1425 volts. So what this patch
> > > effectively does is overvolt the CPU cores and base system.
> > >
> > > Now overvolting is a well know trick when doing overclocking,
> > > the question is do we really want to ship a config which
> > > overvolts by default?
>
> The A10 datasheet says that the VDD voltage limit is 1.3V, however
> this is not stopping anyone from using 1.4V on A10.
>
> The kernel sources and documentation are full of bullshit numbers.
> Apparently almost everything has to be found by the trial and
> error method.
>
> > > My first instinctive reaction is no we don't. But if this
> > > cures our stability issues, then I think it makes sense to
> > > do so, in this case u-boot should be changed to init the
> > > cpu voltage to 1.425 volt by default too.
>
> Maybe.
>
> > Note all these new settings significantly exceed the
> > official allwinner settings / the settings found in any
> > other a20 device fex file.
>
> What are the official allwinner settings?
>
> > If we want to make these changes, we should probably change
> > this for all boards.
>
> The situation is pretty much out of control. Having these
> settings in the fex files allows the device manufacturers or
> distro/image maintainers to hurt themselves and their users
> quite easily. And they already do it.
>
> I have already said this before: IMHO the only solution to
> contain the damage is the availability of hardware reliability
> testing tools.
>
> > Follow up question, have you tried overvolting your A10 Lime
> > to 1.425 volt to see if that fixes the L2 cache issues you've
> > been seeing there.
>
> Yes
>
> > I know you said 1.450 volt fixes it, but I wonder if 1.425 volt
> > fixes it too
>
> No. In fact even running with 1.450V and 1008MHz clock speed
> eventually fails on my A10-Lime. It looks like this:
>   1.400V - instant failure on every libjpeg-turbo decoding attempt
>   1.425V - very fast failure
>   1.450V - needs to run for many hours to reproduce
>
> With just a single device tested, we can only assume that it might
> be a poor sample of A10. We have no idea how bad or how common it
> can be. Or whether it could be related to something like unusually
> high voltage drop between the AXP209 and A10, somehow specific to
> the A10-Lime PCB (taking a hint from Koen).
>
> > because in that case is may make sense to simply make 1.425V the
> > default voltage (max speed) everywhere.
> >
> > Likewise it would be interesting to repeat the L2 cache failures
> > on (very) low clock speeds on the A20 with the new DVFS table.
>
> There are no failures with the new DVFS table on my Cubietruck.
> And yes, I have adjusted the min_freq to actually test the low
> clock speeds.
>
> Still we are not moving anywhere unless people really start
> testing their hardware themselves. I can't afford to be wasting
> more of my time on this stuff. And I only have a few boards,
> which is a rather small sample base.
>
> I need to make the tools more idiot-proof, write something
> comprehensive at http://linux-sunxi.org/Hardware_Reliability_Tests
> and just use this link on everyone complaining about their
> hardware deadlocking once in a while :-)
>
> --
> Best regards,
> Siarhei Siamashka
>
> --
> You received this message because you are subscribed to the Google Groups
> "linux-sunxi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to linux-sunxi+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to