On vendredi, 4 mars 2016 10.09:59 h CET Bruno Friedmann wrote:
> On vendredi, 4 mars 2016 08.52:27 h CET Michal Kubecek wrote:
> > On čtvrtek 3. března 2016 19:47 Bruno Friedmann wrote:
> > > 
> > > But on two machines : having an adaptec raid controleur 6805 I'm
> > > getting a kernel backtrace (sorry no time until today to report it
> > > correctly)
> > > Which is curious, I was using during a certain time a kernel 4x series
> > > on them.... Not a big deal for myself, but can be really tricky to
> > > recover from ...
>  
> > Do you remember what was the last working version? The aacraid has been 
> > backported for SLE12-SP1 so that the version in the evergreen 13.1 
> > kernel differs from current mainline only in 2 or 3 commits which do not 
> > seem very important.
> 
> On one of them I've this history of kernel used (during a time I was using
> kernel:standard) before switching back to evergreen.
> 
> 2014-04-17 
> 22:24:57|kernel-default|3.11.6-4.1|x86_64|root@clochette.disney.interne|openSUSE-13.1-1.10
> 2014-04-17 23:13:36|kernel-default|3.11.10-7.1|x86_64||updates
> 2014-04-18 
> 00:32:55|kernel-default|3.14.1-1.1.geafcebd|x86_64|root@clochette|kernel-stable
> 2014-05-06 18:17:35|kernel-default|3.14.2-1.1.g1474ea5|x86_64||kernel-stable
> 2014-05-20 18:30:49|kernel-default|3.11.10-11.1|x86_64||updates
> 2014-05-20 18:35:06|kernel-default|3.14.4-1.1.gbebeb6f|x86_64||kernel-stable
> 2014-06-06 17:15:46|kernel-default|3.14.4-2.1.g0de0f93|x86_64||kernel-stable
> 2014-07-01 17:38:11|kernel-default|3.15.2-1.1.gfb7c781|x86_64||kernel-stable
> 2014-07-01 17:41:17|kernel-default|3.11.10-17.2|x86_64||updates
> 2014-07-10 18:47:49|kernel-default|3.15.4-1.1.g2b59ae6|x86_64||kernel-stable
> 2014-07-29 18:28:29|kernel-default|3.15.6-2.1.gedc5ddf|x86_64||kernel-stable
> 2014-08-01 17:21:02|kernel-default|3.15.7-1.1.g972d9a6|x86_64||kernel-stable
> 2014-08-13 19:27:38|kernel-default|3.11.10-21.1|x86_64||updates
> 2014-08-13 19:28:49|kernel-default|3.15.8-2.1.g258e3b0|x86_64||kernel-stable
> 2014-09-12 17:56:36|kernel-default|3.16.2-1.1.gdcee397|x86_64||kernel-stable
> 2014-09-30 11:15:49|kernel-default|3.16.3-1.1.gd2bbe7f|x86_64||kernel-stable
> 2014-10-17 18:07:46|kernel-default|3.17.0-1.1.gc467423|x86_64||kernel-stable
> 2014-12-03 17:53:49|kernel-default|3.17.4-2.1.g2d23787|x86_64||kernel-stable
> 2015-01-06 
> 17:54:51|kernel-default|3.18.1-1.1.g5f2f35e|x86_64|root@clochette|kernel-stable
> 2015-01-20 17:59:23|kernel-default|3.18.2-2.1.g88366a3|x86_64||kernel-stable
> 2015-02-03 17:44:26|kernel-default|3.18.5-1.1.gf378da4|x86_64||kernel-stable
> 2015-03-03 17:57:05|kernel-default|3.19.0-4.1.g7f0e735|x86_64||kernel-stable
> 2015-03-11 18:07:01|kernel-default|3.19.1-2.1.gc0946e9|x86_64||kernel-stable
> 2015-03-21 09:05:51|kernel-default|3.19.2-1.1.gf2f9797|x86_64||kernel-stable
> 2015-04-02 17:00:19|kernel-default|3.19.3-1.1.gf10e7fc|x86_64||kernel-stable
> 2015-04-17 19:04:27|kernel-default|3.19.4-1.1.g74c332b|x86_64||kernel-stable
> 2015-05-13 18:15:53|kernel-default|4.0.2-1.1.ga425d38|x86_64||kernel-stable
> 2015-06-02 18:18:37|kernel-default|4.0.4-4.1.gad54361|x86_64||kernel-stable
> 2015-06-16 18:13:00|kernel-default|4.0.5-2.1.g0e899eb|x86_64||kernel-stable
> 2015-07-14 17:59:47|kernel-default|4.1.1-2.1.gcac28b3|x86_64||kernel-stable
> 2015-07-29 13:56:16|kernel-default|4.1.3-5.1.ga0f869c|x86_64||kernel-stable
> 2015-08-11 11:26:26|kernel-default|4.1.4-1.1.ga37e14f|x86_64||kernel-stable
> 2015-08-15 10:53:25|kernel-default|4.1.5-2.1.g83fbd4e|x86_64||kernel-stable
> 2016-02-02 
> 18:25:21|kernel-default|4.4.0-8.1.g9f68b90|x86_64|root@clochette|kernel-stable
> 2016-02-17 
> 18:45:43|kernel-default|3.12.51-2.1|x86_64|root@clochette|kernel-evergreen
> 2016-02-17 19:37:15|kernel-default|3.11.10-34.2|x86_64|root@sysresccd|updates
> 2016-03-01 17:52:16|kernel-default|3.12.53-1.1|x86_64||kernel-evergreen
> 
> The last high number was 4.4.0, and the first working > 3.11 was 3.14.1
> 
> this is how arcconf tools see the controler and system on a pure
> 3.11.10-34-default 
>    --------------------------------------------------------
>    Controller Version Information 6805
>    --------------------------------------------------------
>    BIOS                                     : 5.2-0 (19147)
>    Firmware                                 : 5.2-0 (19147)
>    Driver                                   : 1.2-0 (30200)
>    Boot Flash                               : 5.2-0 (19147)
> 
> 
> 
> On another one which has a different controleur but working 3.12.53
>    --------------------------------------------------------
>    Controller Version Information 5805
>    --------------------------------------------------------
>    BIOS                                     : 5.2-0 (18948)
>    Firmware                                 : 5.2-0 (18948)
>    Driver                                   : 1.2-1 (40709)
>    Boot Flash                               : 5.2-0 (18948)
> 
> We saw the driver get an update 1.2-0 to 1.2-1 
>  
> > > We were able to capture some informations,
> > > https://dav.ioda.net/index.php/s/4wyMDlKot3Z1F8w
> > 
> > I'm not really an expert in this area but it looks like an IRQ is 
> > received and handled before all the device data structures are set up 
> > properly (a pointer which is still null is dereferenced).
> > 
> 
> 
> The most funky is on the list of system 3 of them share almost every hardware 
> piece
> same motherboard Asus CROSSHAIR V FORMULA-Z, BIOS 2101 04/17/2014
> same ram TridentX - F3-2400C10D-8GTX - G.SKILL DDR3 Memory x4
> same cpu AMD FX(tm)-8350 Eight-Core Processor
> The main differences are one has a 8805 and intel PT1000 + nvidia GeForce GTX 
> 560 (with nvidia blob)
> (working)
> 
> And the two failing have a 6805 + Intel 10-Gigabit X540-AT2 + Nvidia GT218 
> (pci-e 1x) with nouveau
> As the crash message really involve aacraid, That's how I deducted the 6800 
> is the culprit in the
> stack.
> 
> > > It is not easy to play with those servers, I've only a small free
> > > timeframe ... It seems our controler are missing a firmware update
> > > which will be make next tuesday night.
> > 
> > Let's see if firmware update changes anything.
> > Michal Kubecek
> 
> Perhaps I can convince customer to make a update test on one of them
> already this week-end.

We were able to upgrade the firmware, and then retest the 3.12.53
but unfortunately it crash at the same place, or just a bit after : aacraid not 
initialized correctly.
And the error captured look like as before.

I will not bother too much, one of the computer will receive an upgrade to Leap 
soon.
The other one, well will stay with the 3.11 until its upgrade.

But perhaps, you could check if in bugzilla@suse there's some people who have 
been reported
this kind of bugs. I can't access the private bugs against SLE, and I didn't 
find anything
open on bugzilla.

If this crash is not yet reported, then I will open one just for reference, in 
case
someone get hurt.

https://dav.ioda.net/index.php/s/baFpxW08bGnkM13
https://dav.ioda.net/index.php/s/9abts7bRUUK32AJ

-- 

Bruno Friedmann 
Ioda-Net Sàrl www.ioda-net.ch
 
 openSUSE Member, fsfe fellowship
 GPG KEY : D5C9B751C4653227
 irc: tigerfoot

_______________________________________________
Evergreen mailing list
Evergreen@lists.rosenauer.org
http://lists.rosenauer.org/mailman/listinfo/evergreen

Reply via email to