On 9/25/16 8:37 PM, Rich Freeman wrote: > On Sun, Sep 25, 2016 at 7:22 PM, Jeff Mahoney <je...@suse.com> wrote: >> On 9/25/16 9:55 AM, Rich Freeman wrote: >>> On Fri, Sep 23, 2016 at 12:58 AM, Duncan <1i5t5.dun...@cox.net> wrote: >>>> >>>> Btrfs raid1 you say, and you have existing compressed files it's trying >>>> to read in the backtrace? >>>> >>>> Sounds like the issues I see sometimes and have posted about where after >>>> a crash that resulted in one device of my raid1 pair getting behind the >>>> other, the kernel will crash if it sees too many csum-errors, even tho >>>> it's /supposed/ to check the other copy and read from it if valid (which >>>> it is as a btrfs scrub resolves the issue). >>>> >>>> When booted to rescue/single-user mode, can you run a scrub? >>> >>> After a few reboots trying to capture the initial panic message (even >>> when I set panic_on_oops=1 I was getting multiple ones with only the >>> tainted one staying on screen), the system managed to stay up. I >>> completed a scrub and it found no errors. I also haven't had any >>> issues with it but haven't attempted another reboot. I figured the >>> safest course was to just leave it on for a good week so that whatever >>> was in the log/etc that was giving it trouble works its way out. I'm >>> also doing a balance which may or may not help (and which is useful >>> anyway since I increased the size of the drive I replaced). >> >> If it stays up, can you post the initial Oops then? >> > > Unfortunately, it stays up because there is no OOPS. It was crashing > fairly consistently, but for whatever reason it didn't this time. > Since I needed the box working and wasn't having a lot of luck > capturing the OOPS I just let it run with minimal prodding, and > hopefully it is now in a state where it won't crash. > > But, if it happens again I'll try to capture an initial OOPS output, > and I'll do a memory test in any case (though I really am not > expecting anything there). > > If I were able to get kernel core dumping working on this machine, > would that contain information about the initial oops. I forget if > they contain the full ring buffer/etc. I used to have it working but > some change in either the kernel or the utils was causing issues with > it. I still boot my kernels with space set aside for the crash > kernel...
I'm not sure about other distros, but at least with SLES/openSUSE you can configure kdump to /just/ dump the dmesg. -Jeff -- Jeff Mahoney SUSE Labs
signature.asc
Description: OpenPGP digital signature