Hi. On 5 November 2013 02:32, David Kuehling <[email protected]> wrote:
> >>>>> "Graham" == Graham Whaley <[email protected]> writes: > > > On 4 November 2013 07:18, David Kuehling <[email protected]> wrote: > >>>>>> "Andreas" == Andreas Barth <[email protected]> writes: > > >> * David Kuehling ([email protected]) [131103 16:00]: > >>> Since then I've encountered system deadlocks every one-two days (the > >>> system in question is running continuously 24/7). Deadlock meaning, > >>> that the system does seems completely dead, even num-lock LED cannot > >>> be toggled any more (but fan is still spinning etc.). > >>> > >>> I never had stability problems on kernel 2.6.39. I did have a > >>> single deadlock when testing the debian-backports kernel package for > >>> kernel 3.2.0 on debian squeeze (but I ran that kernel only for about > >>> 2 days before upgrading to Wheezy). > > >> Can you try the old kernel if it happens with the old kernel and new > >> userland? > [..] > > As to the instabilities - it occurs to me if this may be connected to > > the Loongson 2f 'issues', as documented at [1] I believe (and btw, > > would love if somebody could confirm and point me at any archive > > links) that Debian-mips moved from MIPSI to MIPSII ISA when it when > > from Squeeze to Wheezy. I'm wondering if maybe that change in code > > layout may have bought one of these issues to the surface? Or maybe > > that your kernel or RFS needs to be built with the options listed in > > the link, and you've been "lucky" so far? As far as I can find out, > > there is no easy way (apart from maybe looking at the top of the chip > > :-( ) to tell if you have a 2F01, 2F02 or 2F03 version of the 2F SoC, > > and only the 2F03 is 'fixed' :-( Anybody know for sure? I'm sure this > > has probably been discussed before in the past. > > > Please feel free to educate me on if any of these 2F fixes are turned > > on by default for upstream Debian. I doubt they are? And sorry if I've > > missed some subtlety here? > > Hi Graham, > > I had the same thought - the lockups certainly look similar to what I > experienced when running a Linux kernel compiled from source without the > Loongson2f instruction fixes enabled in the kernel config (that would > indicate I have one of the older SoCs). > > Looking at the output of objdump -D libc.so, it looks to me like the > correct "fixed" NOP sequence is used (shown by the disassembler as "move > at,at", which is synonomous for "or at,at,zero). So Debian userspace > looks like it's loongson compatible. > > Running objdump -D on the 3.2 kernel image (that's a gzip compressed > image, so I guess the code I see is only a small bootstrap sequence for > ungzipping the rest) I can see the right NOP sequence plus the extra > code in front of indirect jumps (e.g. function return statements). This > looks like being compiled with -mfix-loongson2f-jump plus > -mfix-loongson-nop. > > So far this looks good. Hopefully we're not hitting new, undocumented > CPU bugs here. > > After finally getting update-initrd to build a working image for my > 2.6.39.4 kernel, I'm now back to running the same kernel I used a long > time with squeeze. If the lockups don't happen until the end of week > we'll have another data point. > Any luck with this David - did it lock up, or still running ? Graham > > With 2.6.39.4 BTW I'm not using the loongson2f optimized libc from > package libc6-loongson2f (only newer kernels seem to supply the right > hwcap info for ld.so to choose the loongson2f optimized versions of > libraries). I'll have to run another test to see whether the loongson2f > libc has anything to do WRT lockups. > > I noticed that going from linux 2.6.39 to 3.2, the process scheduling > improved dramatically. On 2.6.39 'nice' values seem to be ignored, and > output of 'top' often looks wrong, like multiple processes using exactly > the same CPU amount, without any variation. Maybe newer loongson2f > kernels changed to using a more accurate clock source for process CPU > usage accounting. These changes could also be a source for deadlocks. > Hopefully I'll not have to bisect all linux versions before 3.2 to > finally solve the issue. > > cheers, > > David > > -- > GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg > Fingerprint: B63B 6AF2 4EEB F033 46F7 7F1D 935E 6F08 E457 205F >

