On 9/25/16 8:37 PM, Rich Freeman wrote:
> On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <je...@suse.com> wrote:
>> On 9/25/16 9:55 AM, Rich Freeman wrote:
>>> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.dun...@cox.net> wrote:
>>>>
>>>> Btrfs raid1 you say, and you have existing compressed files it's trying
>>>> to read in the backtrace?
>>>>
>>>> Sounds like the issues I see sometimes and have posted about where after
>>>> a crash that resulted in one device of my raid1 pair getting behind the
>>>> other, the kernel will crash if it sees too many csum-errors, even tho
>>>> it's /supposed/ to check the other copy and read from it if valid (which
>>>> it is as a btrfs scrub resolves the issue).
>>>>
>>>> When booted to rescue/single-user mode, can you run a scrub?
>>>
>>> After a few reboots trying to capture the initial panic message (even
>>> when I set panic_on_oops=1 I was getting multiple ones with only the
>>> tainted one staying on screen), the system managed to stay up.  I
>>> completed a scrub and it found no errors.  I also haven't had any
>>> issues with it but haven't attempted another reboot.  I figured the
>>> safest course was to just leave it on for a good week so that whatever
>>> was in the log/etc that was giving it trouble works its way out.  I'm
>>> also doing a balance which may or may not help (and which is useful
>>> anyway since I increased the size of the drive I replaced).
>>
>> If it stays up, can you post the initial Oops then?
>>
> 
> Unfortunately, it stays up because there is no OOPS.  It was crashing
> fairly consistently, but for whatever reason it didn't this time.
> Since I needed the box working and wasn't having a lot of luck
> capturing the OOPS I just let it run with minimal prodding, and
> hopefully it is now in a state where it won't crash.
> 
> But, if it happens again I'll try to capture an initial OOPS output,
> and I'll do a memory test in any case (though I really am not
> expecting anything there).
> 
> If I were able to get kernel core dumping working on this machine,
> would that contain information about the initial oops.  I forget if
> they contain the full ring buffer/etc.  I used to have it working but
> some change in either the kernel or the utils was causing issues with
> it.  I still boot my kernels with space set aside for the crash
> kernel...

I'm not sure about other distros, but at least with SLES/openSUSE you
can configure kdump to /just/ dump the dmesg.

-Jeff


-- 
Jeff Mahoney
SUSE Labs

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to