Just an idea: Can the convince the most popular distros to add an automatic stability test which sends the data to a server? It could be advertised by e.g. the mod banner. I think thatn then we could become enough data to set this right. What a pitty that AllWinner can't provide us with safe numbers...
2014-05-13 0:47 GMT+02:00 Siarhei Siamashka <siarhei.siamas...@gmail.com>: > On Sun, 11 May 2014 22:58:31 +0200 > Hans de Goede <hdego...@redhat.com> wrote: > > > Hi, > > > > On 05/11/2014 10:43 PM, Hans de Goede wrote: > > > Hi, > > > > > > On 05/11/2014 11:53 AM, Siarhei Siamashka wrote: > > >> It has been confirmed that a substantial percentage of cubieboard2 > > >> and cubietruck users are having stability issues. These issues are > > >> caused by having various voltages optimistically configured way too > > >> low. Because each unit has its own tolerances, not everyone can > > >> easily reproduce these problems. > > >> > > >> To address the issue, we take the updated settings from: > > >> > https://github.com/cubieboard/cubie_configs/tree/09e511721697/sysconfig/linux > > >> This repository is relevant because it is referenced from: > > >> > http://docs.cubieboard.org/tutorials/ct1/development/compiling_latest_kernel_for_cubietruck_cubieboard3 > > >> Please note that there is also a sunxi-boards repository fork at > > >> https://github.com/cubieboard (with bad settings) and this makes > > >> things more confusing than necessary. > > >> > > >> Signed-off-by: Siarhei Siamashka <siarhei.siamas...@gmail.com> > > >> --- > > >> sys_config/a20/cubieboard2.fex | 18 +++++++++--------- > > >> sys_config/a20/cubietruck.fex | 16 ++++++++-------- > > >> 2 files changed, 17 insertions(+), 17 deletions(-) > > >> > > >> diff --git a/sys_config/a20/cubieboard2.fex > b/sys_config/a20/cubieboard2.fex > > >> index 1436df8..c8c9c74 100644 > > >> --- a/sys_config/a20/cubieboard2.fex > > >> +++ b/sys_config/a20/cubieboard2.fex > > >> @@ -1,14 +1,14 @@ > > >> [product] > > >> version = "100" > > >> -machine = "cubieboard" > > >> +machine = "cubieboard2" > > >> > > >> [platform] > > >> eraseflag = 0 > > >> > > >> [target] > > >> boot_clock = 912 > > >> -dcdc2_vol = 1400 > > >> -dcdc3_vol = 1250 > > >> +dcdc2_vol = 1450 > > > > > > This makes no sense, since in the dvfs table there is: > > > max_freq = 912000000 > > > LV2_freq = 912000000 > > > LV2_volt = 1425 > > My understanding is that the dcdc2_vol voltage value is in use in > the case if cpufreq is disabled and when the dvfs part of fex has > no effect. So I think that it tries to play defensively and > safeguard against the bootloader setting something like: > LV1_freq = 1008000000 > LV1_volt = 1450 > > Sure, in u-boot-sunxi we have the following code, which is hardcoding > the CPU clock frequency to 912MHz: > > #ifdef CONFIG_SUN7I > clock_set_pll1(912000000); > #else > clock_set_pll1(1008000000); > #endif > > However this fex and u-boot interaction and implicit dependency > is quite nasty. It has already bitten us in the butt with the > u-boot setting dcdc3 to 1.3V and fex reverting it back to 1.25V > > Anyway, what do you suggest? > > > > S0 1.425 volt would make a lot more sense. Also Page 31 > > > of A20+Datasheet+v1.0+20130227.pdf says that the MAX > > > CPU and systemvoltage is 1425 volts. So what this patch > > > effectively does is overvolt the CPU cores and base system. > > > > > > Now overvolting is a well know trick when doing overclocking, > > > the question is do we really want to ship a config which > > > overvolts by default? > > The A10 datasheet says that the VDD voltage limit is 1.3V, however > this is not stopping anyone from using 1.4V on A10. > > The kernel sources and documentation are full of bullshit numbers. > Apparently almost everything has to be found by the trial and > error method. > > > > My first instinctive reaction is no we don't. But if this > > > cures our stability issues, then I think it makes sense to > > > do so, in this case u-boot should be changed to init the > > > cpu voltage to 1.425 volt by default too. > > Maybe. > > > Note all these new settings significantly exceed the > > official allwinner settings / the settings found in any > > other a20 device fex file. > > What are the official allwinner settings? > > > If we want to make these changes, we should probably change > > this for all boards. > > The situation is pretty much out of control. Having these > settings in the fex files allows the device manufacturers or > distro/image maintainers to hurt themselves and their users > quite easily. And they already do it. > > I have already said this before: IMHO the only solution to > contain the damage is the availability of hardware reliability > testing tools. > > > Follow up question, have you tried overvolting your A10 Lime > > to 1.425 volt to see if that fixes the L2 cache issues you've > > been seeing there. > > Yes > > > I know you said 1.450 volt fixes it, but I wonder if 1.425 volt > > fixes it too > > No. In fact even running with 1.450V and 1008MHz clock speed > eventually fails on my A10-Lime. It looks like this: > 1.400V - instant failure on every libjpeg-turbo decoding attempt > 1.425V - very fast failure > 1.450V - needs to run for many hours to reproduce > > With just a single device tested, we can only assume that it might > be a poor sample of A10. We have no idea how bad or how common it > can be. Or whether it could be related to something like unusually > high voltage drop between the AXP209 and A10, somehow specific to > the A10-Lime PCB (taking a hint from Koen). > > > because in that case is may make sense to simply make 1.425V the > > default voltage (max speed) everywhere. > > > > Likewise it would be interesting to repeat the L2 cache failures > > on (very) low clock speeds on the A20 with the new DVFS table. > > There are no failures with the new DVFS table on my Cubietruck. > And yes, I have adjusted the min_freq to actually test the low > clock speeds. > > Still we are not moving anywhere unless people really start > testing their hardware themselves. I can't afford to be wasting > more of my time on this stuff. And I only have a few boards, > which is a rather small sample base. > > I need to make the tools more idiot-proof, write something > comprehensive at http://linux-sunxi.org/Hardware_Reliability_Tests > and just use this link on everyone complaining about their > hardware deadlocking once in a while :-) > > -- > Best regards, > Siarhei Siamashka > > -- > You received this message because you are subscribed to the Google Groups > "linux-sunxi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to linux-sunxi+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.