On vendredi, 4 mars 2016 08.52:27 h CET Michal Kubecek wrote: > On čtvrtek 3. března 2016 19:47 Bruno Friedmann wrote: > > > > But on two machines : having an adaptec raid controleur 6805 I'm > > getting a kernel backtrace (sorry no time until today to report it > > correctly) > > Which is curious, I was using during a certain time a kernel 4x series > > on them.... Not a big deal for myself, but can be really tricky to > > recover from ... > Do you remember what was the last working version? The aacraid has been > backported for SLE12-SP1 so that the version in the evergreen 13.1 > kernel differs from current mainline only in 2 or 3 commits which do not > seem very important.
On one of them I've this history of kernel used (during a time I was using kernel:standard) before switching back to evergreen. 2014-04-17 22:24:57|kernel-default|3.11.6-4.1|x86_64|root@clochette.disney.interne|openSUSE-13.1-1.10 2014-04-17 23:13:36|kernel-default|3.11.10-7.1|x86_64||updates 2014-04-18 00:32:55|kernel-default|3.14.1-1.1.geafcebd|x86_64|root@clochette|kernel-stable 2014-05-06 18:17:35|kernel-default|3.14.2-1.1.g1474ea5|x86_64||kernel-stable 2014-05-20 18:30:49|kernel-default|3.11.10-11.1|x86_64||updates 2014-05-20 18:35:06|kernel-default|3.14.4-1.1.gbebeb6f|x86_64||kernel-stable 2014-06-06 17:15:46|kernel-default|3.14.4-2.1.g0de0f93|x86_64||kernel-stable 2014-07-01 17:38:11|kernel-default|3.15.2-1.1.gfb7c781|x86_64||kernel-stable 2014-07-01 17:41:17|kernel-default|3.11.10-17.2|x86_64||updates 2014-07-10 18:47:49|kernel-default|3.15.4-1.1.g2b59ae6|x86_64||kernel-stable 2014-07-29 18:28:29|kernel-default|3.15.6-2.1.gedc5ddf|x86_64||kernel-stable 2014-08-01 17:21:02|kernel-default|3.15.7-1.1.g972d9a6|x86_64||kernel-stable 2014-08-13 19:27:38|kernel-default|3.11.10-21.1|x86_64||updates 2014-08-13 19:28:49|kernel-default|3.15.8-2.1.g258e3b0|x86_64||kernel-stable 2014-09-12 17:56:36|kernel-default|3.16.2-1.1.gdcee397|x86_64||kernel-stable 2014-09-30 11:15:49|kernel-default|3.16.3-1.1.gd2bbe7f|x86_64||kernel-stable 2014-10-17 18:07:46|kernel-default|3.17.0-1.1.gc467423|x86_64||kernel-stable 2014-12-03 17:53:49|kernel-default|3.17.4-2.1.g2d23787|x86_64||kernel-stable 2015-01-06 17:54:51|kernel-default|3.18.1-1.1.g5f2f35e|x86_64|root@clochette|kernel-stable 2015-01-20 17:59:23|kernel-default|3.18.2-2.1.g88366a3|x86_64||kernel-stable 2015-02-03 17:44:26|kernel-default|3.18.5-1.1.gf378da4|x86_64||kernel-stable 2015-03-03 17:57:05|kernel-default|3.19.0-4.1.g7f0e735|x86_64||kernel-stable 2015-03-11 18:07:01|kernel-default|3.19.1-2.1.gc0946e9|x86_64||kernel-stable 2015-03-21 09:05:51|kernel-default|3.19.2-1.1.gf2f9797|x86_64||kernel-stable 2015-04-02 17:00:19|kernel-default|3.19.3-1.1.gf10e7fc|x86_64||kernel-stable 2015-04-17 19:04:27|kernel-default|3.19.4-1.1.g74c332b|x86_64||kernel-stable 2015-05-13 18:15:53|kernel-default|4.0.2-1.1.ga425d38|x86_64||kernel-stable 2015-06-02 18:18:37|kernel-default|4.0.4-4.1.gad54361|x86_64||kernel-stable 2015-06-16 18:13:00|kernel-default|4.0.5-2.1.g0e899eb|x86_64||kernel-stable 2015-07-14 17:59:47|kernel-default|4.1.1-2.1.gcac28b3|x86_64||kernel-stable 2015-07-29 13:56:16|kernel-default|4.1.3-5.1.ga0f869c|x86_64||kernel-stable 2015-08-11 11:26:26|kernel-default|4.1.4-1.1.ga37e14f|x86_64||kernel-stable 2015-08-15 10:53:25|kernel-default|4.1.5-2.1.g83fbd4e|x86_64||kernel-stable 2016-02-02 18:25:21|kernel-default|4.4.0-8.1.g9f68b90|x86_64|root@clochette|kernel-stable 2016-02-17 18:45:43|kernel-default|3.12.51-2.1|x86_64|root@clochette|kernel-evergreen 2016-02-17 19:37:15|kernel-default|3.11.10-34.2|x86_64|root@sysresccd|updates 2016-03-01 17:52:16|kernel-default|3.12.53-1.1|x86_64||kernel-evergreen The last high number was 4.4.0, and the first working > 3.11 was 3.14.1 this is how arcconf tools see the controler and system on a pure 3.11.10-34-default -------------------------------------------------------- Controller Version Information 6805 -------------------------------------------------------- BIOS : 5.2-0 (19147) Firmware : 5.2-0 (19147) Driver : 1.2-0 (30200) Boot Flash : 5.2-0 (19147) On another one which has a different controleur but working 3.12.53 -------------------------------------------------------- Controller Version Information 5805 -------------------------------------------------------- BIOS : 5.2-0 (18948) Firmware : 5.2-0 (18948) Driver : 1.2-1 (40709) Boot Flash : 5.2-0 (18948) We saw the driver get an update 1.2-0 to 1.2-1 > > We were able to capture some informations, > > https://dav.ioda.net/index.php/s/4wyMDlKot3Z1F8w > > I'm not really an expert in this area but it looks like an IRQ is > received and handled before all the device data structures are set up > properly (a pointer which is still null is dereferenced). > The most funky is on the list of system 3 of them share almost every hardware piece same motherboard Asus CROSSHAIR V FORMULA-Z, BIOS 2101 04/17/2014 same ram TridentX - F3-2400C10D-8GTX - G.SKILL DDR3 Memory x4 same cpu AMD FX(tm)-8350 Eight-Core Processor The main differences are one has a 8805 and intel PT1000 + nvidia GeForce GTX 560 (with nvidia blob) (working) And the two failing have a 6805 + Intel 10-Gigabit X540-AT2 + Nvidia GT218 (pci-e 1x) with nouveau As the crash message really involve aacraid, That's how I deducted the 6800 is the culprit in the stack. > > It is not easy to play with those servers, I've only a small free > > timeframe ... It seems our controler are missing a firmware update > > which will be make next tuesday night. > > Let's see if firmware update changes anything. > Michal Kubecek Perhaps I can convince customer to make a update test on one of them already this week-end. -- Bruno Friedmann Ioda-Net Sàrl www.ioda-net.ch openSUSE Member, fsfe fellowship GPG KEY : D5C9B751C4653227 irc: tigerfoot _______________________________________________ Evergreen mailing list Evergreen@lists.rosenauer.org http://lists.rosenauer.org/mailman/listinfo/evergreen