Birdsarenice posted on Sun, 13 Dec 2015 22:55:19 +0000 as excerpted: > Meanwhile, I did get lucky: At one crash I happened to be logged in and > was able to hit dmesg seconds before it went completely. So what I have > here is information that looks like it'll help you track down a > rarely-encountered and hard-to-reproduce bug which can cause the system > to lock up completely in event of certain types of hard drive failure. > It might be nothing, but perhaps someone will find it of use - because > it'd be a tricky one to both reproduce and get a good error report if it > did occur. > > I see an 'invalid opcode' error in here, that's pretty unusual
Disclaimer: I'm a list regular and (small-scale) sysadmin, not a dev, and most certainly not a btrfs dev. Take what I saw with that in mind, tho I've been active on-list for over a year and thus now have a reasonable level of practical sysadmin configuration and crisis recovery level btrfs experience. You could well be quite correct with the unusual crash log and its value, I'll leave that up to the devs to decide, but that "invalid opcode: 0000" bit is in fact not at all unusual on btrfs. Tho I can say it fooled me originally as well, because it certainly /looks/ both suspicious and in general unusual. Based on how a dev explained it to me, I believe btrfs actually deliberately uses opcode 0000 to trigger a semi-controlled crash in instances where code that "should never happen" actually gets executed for some reason, leaving the kernel is an unknown and thus not trustworthy enough to reliably write to storage devices and do a controlled shutdown. That's of course why the tracebacks are there, to help the devs figure out where it was and what triggered it, but the 0000 opcode itself is actually quite frequently found in these tracebacks, because it's the method chosen to deliberately trigger them. I'd guess the same technique is actually used in various other (non- btrfs) kernel code as well, but in fully stable code it actually is very rarely seen, precisely because it /does/ mean the kernel reached code that it is never expected to reach, meaning something specific went wrong to get to that point, and in fully stable code, it's rare that any code paths actually leading to that sort of execution point remain, as they've all been found over the years. But of course btrfs, while no longer experimental, remains "still stabilizing and maturing, not yet fully stable or mature", so there's still code paths left that do still occasionally reach these intended to be unreachable code points, and when that happens, triggering a crash and hopefully getting a traceback that helps the devs figure out which code path has the bug and why, is a good thing to do, and this is apparently the way it's done. (BTW, compliments on the nick and email address. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html