Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-09-01 Thread Chris Murphy
On Fri, Sep 1, 2017 at 7:38 AM, Eric Wolf <19w...@gmail.com> wrote:
> Okay,
> I have a hex editor open. Now what? Your instructions seems
> straightforward, but I have no idea what I'm doing.

First step, backup as much as you can, because if you don't know what
you're doing, good chance you make a mistake and break the file
system. So be prepared for making things worse.

Next, you need to get the physical sector(s) for the leaf containing
the error. Use btrfs-map-logical -l  for this, where address
is the same one you used before with btrfs-debug-tree. If this is a
default file system, the leaf is 16KiB, that's 32 512 byte sectors.
And btrfs-map-logical will report back LBA based on 512 bytes *if*
this is a 512e drive, which most drives are, but you should make sure.

]$ sudo blockdev --getss /dev/nvme0n1
512

If it's 512 bytes, your bad item is in one of 32 sectors starting from
the LBA reported by btrfs-map-logical. If it's 4096 bytes then it's
one of four sectors starting from that LBA. Find out which sector the
bad key is in, which you have to do looking at hex, literally you will
have to go byte by byte from the beginning, learning how to parse the
on-disk format. And fix the bad item object id as described by Hugo.
And then after you fix that, mount the volume, scrub (or read a file
that will trigger the problem if you can) this time you'll get a bad
csum error instead of the original corrupt leaf error. And that csum
error will tell you what csum it found and what csum it expects. So
now you go back to that first sector, which is where the csum is
stored, find it, and replace it with the expected csum. And maybe (not
sure) the csum error will be in decimal, in which case you'd have to
convert to hex to find the bad one and replace with the good one.

Tedious. Maybe the Hans van Kranenberg's btrfs-python is easier, I have no idea.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-09-01 Thread Eric Wolf
On Thu, Aug 31, 2017 at 4:11 PM, Hugo Mills  wrote:
> On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote:
>> I've previously confirmed it's a bad ram module which I have already
>> submitted an RMA for. Any advice for manually fixing the bits?
>
>What I'd do... use a hex editor and the contents of ctree.h as
> documentation to find the byte in question, change it back to what it
> should be, mount the FS, try reading the directory again, look up the
> csum failure in dmesg, edit the block again to fix up the csum, and
> it's done. (Yes, I've done this before, and I'm a massive nerd).
>
>It's also possible to use Hans van Kranenberg's btrfs-python to fix
> up this kind of thing, but I've not done it myself. There should be a
> couple of talk-throughs from Hans in various archives -- both this
> list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and
> on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html).
>
>> Sorry for top leveling, not sure how mailing lists work (again sorry
>> if this message is top leveled, how do I ensure it's not?)
>
>Just write your answers _after_ the quoted text that you're
> replying to, not before. It's a convention, rather than a technical
> thing...
>
>Hugo.
>
>>
>>
>>
>> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills  wrote:
>> >(Please don't top-post; edited for conversation flow)
>> >
>> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
>> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
>> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
>> >> >> I'm having issues with a bad block(?) on my root ssd.
>> >> >>
>> >> >> dmesg is consistently outputting "BTRFS critical (device sda2):
>> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
>> >> >>
>> >> >> "btrfs scrub stat /" outputs "scrub status for 
>> >> >> b2c9ff7b-[snip]-48a02cc4f508
>> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
>> >> >> total bytes scrubbed: 53.41GiB with 2 errors
>> >> >> error details: verify=2
>> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
>> >> >>
>> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
>> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
>> >> >> 100% and disk activity remains at 0.
>> >> >
>> >> >This error is usually attributable to bad hardware. Typically RAM,
>> >> > but might also be marginal power regulation (blown capacitor
>> >> > somewhere) or a slightly broken CPU.
>> >> >
>> >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 
>> >> > /dev/sda2"?
>> >
>> >Here's the culprit:
>> >
>> > [snip]
>> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
>> >>inline extent data size 248 ram 248 compress 0
>> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
>> >>inode generation 5386763 transid 5386764 size 135 nbytes 135
>> >>block group 0 mode 100644 links 1 uid 10 gid 10
>> >>rdev 0 flags 0x0
>> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
>> >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0
>> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
>> >>inline extent data size 135 ram 135 compress 0
>> > [snip]
>> >
>> >Note the objectid field -- the first number in the brackets after
>> > "key" for each item. This sequence of values should be non-decreasing.
>> > Thus, item 12 should have an objectid of 890554 to match the items
>> > either side of it, and instead it has 856762.
>> >
>> >In hex, these are:
>> >
>>  hex(890554)
>> > '0xd96ba'
>>  hex(856762)
>> > '0xd12ba'
>> >
>> >Which means you've had two bitflips close together:
>> >
>>  hex(856762 ^ 890554)
>> > '0x8400'
>> >
>> >Given that everything else is OK, and it's just one byte affected
>> > in the middle of a load of data that's really quite sensitive to
>> > errors, it's very unlikely that it's the result of a misplaced pointer
>> > in the kernel, or some other subsystem accidentally walking over that
>> > piece of RAM. It is, therefore, almost certainly your hardware that's
>> > at fault.
>> >
>> >I would strongly suggest running memtest86 on your machine -- I'd
>> > usually say a minimum of 8 hours, or longer if you possibly can (24
>> > hours), or until you have errors reported. If you get errors reported
>> > in the same place on multiple passes, then it's the RAM. If you have
>> > errors scattered around seemingly at random, then it's probably your
>> > power regulation (PSU or motherboard).
>> >
>> >Sadly, btrfs check on its own won't be able to fix this, as it's
>> > two bits flipped. (It can cope with one bit flipped in the key, most
>> > of the time, but not two). It can be fixed manually, if you're
>> > familiar with a hex editor and the on-disk data structures.
>> >
>> 

Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-09-01 Thread Eric Wolf
Okay,
I have a hex editor open. Now what? Your instructions seems
straightforward, but I have no idea what I'm doing.
---
Eric Wolf
(201) 316-6098
19w...@gmail.com


On Thu, Aug 31, 2017 at 4:11 PM, Hugo Mills  wrote:
> On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote:
>> I've previously confirmed it's a bad ram module which I have already
>> submitted an RMA for. Any advice for manually fixing the bits?
>
>What I'd do... use a hex editor and the contents of ctree.h as
> documentation to find the byte in question, change it back to what it
> should be, mount the FS, try reading the directory again, look up the
> csum failure in dmesg, edit the block again to fix up the csum, and
> it's done. (Yes, I've done this before, and I'm a massive nerd).
>
>It's also possible to use Hans van Kranenberg's btrfs-python to fix
> up this kind of thing, but I've not done it myself. There should be a
> couple of talk-throughs from Hans in various archives -- both this
> list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and
> on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html).
>
>> Sorry for top leveling, not sure how mailing lists work (again sorry
>> if this message is top leveled, how do I ensure it's not?)
>
>Just write your answers _after_ the quoted text that you're
> replying to, not before. It's a convention, rather than a technical
> thing...
>
>Hugo.
>
>> ---
>> Eric Wolf
>> (201) 316-6098
>> 19w...@gmail.com
>>
>>
>> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills  wrote:
>> >(Please don't top-post; edited for conversation flow)
>> >
>> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
>> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
>> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
>> >> >> I'm having issues with a bad block(?) on my root ssd.
>> >> >>
>> >> >> dmesg is consistently outputting "BTRFS critical (device sda2):
>> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
>> >> >>
>> >> >> "btrfs scrub stat /" outputs "scrub status for 
>> >> >> b2c9ff7b-[snip]-48a02cc4f508
>> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
>> >> >> total bytes scrubbed: 53.41GiB with 2 errors
>> >> >> error details: verify=2
>> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
>> >> >>
>> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
>> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
>> >> >> 100% and disk activity remains at 0.
>> >> >
>> >> >This error is usually attributable to bad hardware. Typically RAM,
>> >> > but might also be marginal power regulation (blown capacitor
>> >> > somewhere) or a slightly broken CPU.
>> >> >
>> >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 
>> >> > /dev/sda2"?
>> >
>> >Here's the culprit:
>> >
>> > [snip]
>> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
>> >>inline extent data size 248 ram 248 compress 0
>> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
>> >>inode generation 5386763 transid 5386764 size 135 nbytes 135
>> >>block group 0 mode 100644 links 1 uid 10 gid 10
>> >>rdev 0 flags 0x0
>> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
>> >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0
>> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
>> >>inline extent data size 135 ram 135 compress 0
>> > [snip]
>> >
>> >Note the objectid field -- the first number in the brackets after
>> > "key" for each item. This sequence of values should be non-decreasing.
>> > Thus, item 12 should have an objectid of 890554 to match the items
>> > either side of it, and instead it has 856762.
>> >
>> >In hex, these are:
>> >
>>  hex(890554)
>> > '0xd96ba'
>>  hex(856762)
>> > '0xd12ba'
>> >
>> >Which means you've had two bitflips close together:
>> >
>>  hex(856762 ^ 890554)
>> > '0x8400'
>> >
>> >Given that everything else is OK, and it's just one byte affected
>> > in the middle of a load of data that's really quite sensitive to
>> > errors, it's very unlikely that it's the result of a misplaced pointer
>> > in the kernel, or some other subsystem accidentally walking over that
>> > piece of RAM. It is, therefore, almost certainly your hardware that's
>> > at fault.
>> >
>> >I would strongly suggest running memtest86 on your machine -- I'd
>> > usually say a minimum of 8 hours, or longer if you possibly can (24
>> > hours), or until you have errors reported. If you get errors reported
>> > in the same place on multiple passes, then it's the RAM. If you have
>> > errors scattered around seemingly at random, then it's probably your
>> > power regulation (PSU or motherboard).
>> >
>> >Sadly, btrfs check on its own won't be able to fix this, 

Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Hugo Mills
On Thu, Aug 31, 2017 at 03:21:07PM -0400, Eric Wolf wrote:
> I've previously confirmed it's a bad ram module which I have already
> submitted an RMA for. Any advice for manually fixing the bits?

   What I'd do... use a hex editor and the contents of ctree.h as
documentation to find the byte in question, change it back to what it
should be, mount the FS, try reading the directory again, look up the
csum failure in dmesg, edit the block again to fix up the csum, and
it's done. (Yes, I've done this before, and I'm a massive nerd).

   It's also possible to use Hans van Kranenberg's btrfs-python to fix
up this kind of thing, but I've not done it myself. There should be a
couple of talk-throughs from Hans in various archives -- both this
list (find it on, say, http://www.spinics.net/lists/linux-btrfs/), and
on the IRC archives (http://logs.tvrrug.org.uk/logs/%23btrfs/latest.html).

> Sorry for top leveling, not sure how mailing lists work (again sorry
> if this message is top leveled, how do I ensure it's not?)

   Just write your answers _after_ the quoted text that you're
replying to, not before. It's a convention, rather than a technical
thing...

   Hugo.

> ---
> Eric Wolf
> (201) 316-6098
> 19w...@gmail.com
> 
> 
> On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills  wrote:
> >(Please don't top-post; edited for conversation flow)
> >
> > On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
> >> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
> >> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
> >> >> I'm having issues with a bad block(?) on my root ssd.
> >> >>
> >> >> dmesg is consistently outputting "BTRFS critical (device sda2):
> >> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
> >> >>
> >> >> "btrfs scrub stat /" outputs "scrub status for 
> >> >> b2c9ff7b-[snip]-48a02cc4f508
> >> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
> >> >> total bytes scrubbed: 53.41GiB with 2 errors
> >> >> error details: verify=2
> >> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
> >> >>
> >> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
> >> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
> >> >> 100% and disk activity remains at 0.
> >> >
> >> >This error is usually attributable to bad hardware. Typically RAM,
> >> > but might also be marginal power regulation (blown capacitor
> >> > somewhere) or a slightly broken CPU.
> >> >
> >> >Can you show us the output of "btrfs-debug-tree -b 293438636032 
> >> > /dev/sda2"?
> >
> >Here's the culprit:
> >
> > [snip]
> >> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
> >>inline extent data size 248 ram 248 compress 0
> >> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
> >>inode generation 5386763 transid 5386764 size 135 nbytes 135
> >>block group 0 mode 100644 links 1 uid 10 gid 10
> >>rdev 0 flags 0x0
> >> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
> >>inode ref index 2745 namelen 19 name: dpkg.statoverride.0
> >> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
> >>inline extent data size 135 ram 135 compress 0
> > [snip]
> >
> >Note the objectid field -- the first number in the brackets after
> > "key" for each item. This sequence of values should be non-decreasing.
> > Thus, item 12 should have an objectid of 890554 to match the items
> > either side of it, and instead it has 856762.
> >
> >In hex, these are:
> >
>  hex(890554)
> > '0xd96ba'
>  hex(856762)
> > '0xd12ba'
> >
> >Which means you've had two bitflips close together:
> >
>  hex(856762 ^ 890554)
> > '0x8400'
> >
> >Given that everything else is OK, and it's just one byte affected
> > in the middle of a load of data that's really quite sensitive to
> > errors, it's very unlikely that it's the result of a misplaced pointer
> > in the kernel, or some other subsystem accidentally walking over that
> > piece of RAM. It is, therefore, almost certainly your hardware that's
> > at fault.
> >
> >I would strongly suggest running memtest86 on your machine -- I'd
> > usually say a minimum of 8 hours, or longer if you possibly can (24
> > hours), or until you have errors reported. If you get errors reported
> > in the same place on multiple passes, then it's the RAM. If you have
> > errors scattered around seemingly at random, then it's probably your
> > power regulation (PSU or motherboard).
> >
> >Sadly, btrfs check on its own won't be able to fix this, as it's
> > two bits flipped. (It can cope with one bit flipped in the key, most
> > of the time, but not two). It can be fixed manually, if you're
> > familiar with a hex editor and the on-disk data structures.
> >
> >Hugo.
> >

-- 
Hugo Mills | "There's a Martian war machine outside -- they want
hugo@... carfax.org.uk | to talk to you 

Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Eric Wolf
I've previously confirmed it's a bad ram module which I have already
submitted an RMA for. Any advice for manually fixing the bits?

Sorry for top leveling, not sure how mailing lists work (again sorry
if this message is top leveled, how do I ensure it's not?)
---
Eric Wolf
(201) 316-6098
19w...@gmail.com


On Thu, Aug 31, 2017 at 2:59 PM, Hugo Mills  wrote:
>(Please don't top-post; edited for conversation flow)
>
> On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
>> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
>> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
>> >> I'm having issues with a bad block(?) on my root ssd.
>> >>
>> >> dmesg is consistently outputting "BTRFS critical (device sda2):
>> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
>> >>
>> >> "btrfs scrub stat /" outputs "scrub status for 
>> >> b2c9ff7b-[snip]-48a02cc4f508
>> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
>> >> total bytes scrubbed: 53.41GiB with 2 errors
>> >> error details: verify=2
>> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
>> >>
>> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
>> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
>> >> 100% and disk activity remains at 0.
>> >
>> >This error is usually attributable to bad hardware. Typically RAM,
>> > but might also be marginal power regulation (blown capacitor
>> > somewhere) or a slightly broken CPU.
>> >
>> >Can you show us the output of "btrfs-debug-tree -b 293438636032 
>> > /dev/sda2"?
>
>Here's the culprit:
>
> [snip]
>> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
>>inline extent data size 248 ram 248 compress 0
>> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
>>inode generation 5386763 transid 5386764 size 135 nbytes 135
>>block group 0 mode 100644 links 1 uid 10 gid 10
>>rdev 0 flags 0x0
>> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
>>inode ref index 2745 namelen 19 name: dpkg.statoverride.0
>> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
>>inline extent data size 135 ram 135 compress 0
> [snip]
>
>Note the objectid field -- the first number in the brackets after
> "key" for each item. This sequence of values should be non-decreasing.
> Thus, item 12 should have an objectid of 890554 to match the items
> either side of it, and instead it has 856762.
>
>In hex, these are:
>
 hex(890554)
> '0xd96ba'
 hex(856762)
> '0xd12ba'
>
>Which means you've had two bitflips close together:
>
 hex(856762 ^ 890554)
> '0x8400'
>
>Given that everything else is OK, and it's just one byte affected
> in the middle of a load of data that's really quite sensitive to
> errors, it's very unlikely that it's the result of a misplaced pointer
> in the kernel, or some other subsystem accidentally walking over that
> piece of RAM. It is, therefore, almost certainly your hardware that's
> at fault.
>
>I would strongly suggest running memtest86 on your machine -- I'd
> usually say a minimum of 8 hours, or longer if you possibly can (24
> hours), or until you have errors reported. If you get errors reported
> in the same place on multiple passes, then it's the RAM. If you have
> errors scattered around seemingly at random, then it's probably your
> power regulation (PSU or motherboard).
>
>Sadly, btrfs check on its own won't be able to fix this, as it's
> two bits flipped. (It can cope with one bit flipped in the key, most
> of the time, but not two). It can be fixed manually, if you're
> familiar with a hex editor and the on-disk data structures.
>
>Hugo.
>
> --
> Hugo Mills | "You got very nice eyes, Deedee. Never noticed them
> hugo@... carfax.org.uk | before. They real?"
> http://carfax.org.uk/  |
> PGP: E2AB1DE4  | Don Logan, Sexy Beast
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Hugo Mills
   (Please don't top-post; edited for conversation flow)

On Thu, Aug 31, 2017 at 02:44:39PM -0400, Eric Wolf wrote:
> On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
> > On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
> >> I'm having issues with a bad block(?) on my root ssd.
> >>
> >> dmesg is consistently outputting "BTRFS critical (device sda2):
> >> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
> >>
> >> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508
> >> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
> >> total bytes scrubbed: 53.41GiB with 2 errors
> >> error details: verify=2
> >> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
> >>
> >> Running "btrfs check --repair /dev/sda2" from a live system stalls
> >> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
> >> 100% and disk activity remains at 0.
> >
> >This error is usually attributable to bad hardware. Typically RAM,
> > but might also be marginal power regulation (blown capacitor
> > somewhere) or a slightly broken CPU.
> >
> >Can you show us the output of "btrfs-debug-tree -b 293438636032 
> > /dev/sda2"?

   Here's the culprit:

[snip]
> item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269
>inline extent data size 248 ram 248 compress 0
> item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160
>inode generation 5386763 transid 5386764 size 135 nbytes 135
>block group 0 mode 100644 links 1 uid 10 gid 10
>rdev 0 flags 0x0
> item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29
>inode ref index 2745 namelen 19 name: dpkg.statoverride.0
> item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156
>inline extent data size 135 ram 135 compress 0
[snip]

   Note the objectid field -- the first number in the brackets after
"key" for each item. This sequence of values should be non-decreasing.
Thus, item 12 should have an objectid of 890554 to match the items
either side of it, and instead it has 856762.

   In hex, these are:

>>> hex(890554)
'0xd96ba'
>>> hex(856762)
'0xd12ba'

   Which means you've had two bitflips close together:

>>> hex(856762 ^ 890554)
'0x8400'

   Given that everything else is OK, and it's just one byte affected
in the middle of a load of data that's really quite sensitive to
errors, it's very unlikely that it's the result of a misplaced pointer
in the kernel, or some other subsystem accidentally walking over that
piece of RAM. It is, therefore, almost certainly your hardware that's
at fault.

   I would strongly suggest running memtest86 on your machine -- I'd
usually say a minimum of 8 hours, or longer if you possibly can (24
hours), or until you have errors reported. If you get errors reported
in the same place on multiple passes, then it's the RAM. If you have
errors scattered around seemingly at random, then it's probably your
power regulation (PSU or motherboard).

   Sadly, btrfs check on its own won't be able to fix this, as it's
two bits flipped. (It can cope with one bit flipped in the key, most
of the time, but not two). It can be fixed manually, if you're
familiar with a hex editor and the on-disk data structures.

   Hugo.

-- 
Hugo Mills | "You got very nice eyes, Deedee. Never noticed them
hugo@... carfax.org.uk | before. They real?"
http://carfax.org.uk/  |
PGP: E2AB1DE4  | Don Logan, Sexy Beast


signature.asc
Description: Digital signature


Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Eric Wolf
Also, I know it was caused by bad RAM and that ram has since been removed.
---
Eric Wolf
(201) 316-6098
19w...@gmail.com


On Thu, Aug 31, 2017 at 2:33 PM, Hugo Mills  wrote:
> On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
>> I'm having issues with a bad block(?) on my root ssd.
>>
>> dmesg is consistently outputting "BTRFS critical (device sda2):
>> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
>>
>> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508
>> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
>> total bytes scrubbed: 53.41GiB with 2 errors
>> error details: verify=2
>> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
>>
>> Running "btrfs check --repair /dev/sda2" from a live system stalls
>> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
>> 100% and disk activity remains at 0.
>
>This error is usually attributable to bad hardware. Typically RAM,
> but might also be marginal power regulation (blown capacitor
> somewhere) or a slightly broken CPU.
>
>Can you show us the output of "btrfs-debug-tree -b 293438636032 /dev/sda2"?
>
>Hugo.
>
> --
> Hugo Mills | "You got very nice eyes, Deedee. Never noticed them
> hugo@... carfax.org.uk | before. They real?"
> http://carfax.org.uk/  |
> PGP: E2AB1DE4  | Don Logan, Sexy Beast
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Eric Wolf
leaf 293438636032 items 153 free space 2820 generation 5389981 owner 267

fs uuid b2c9ff7b-[snip]-48a02cc4f508

chunk uuid e60d16b9-ca53-45b3-a47a-e0a146046894

item 0 key (890550 INODE_REF 31762) itemoff 16260 itemsize 23

inode ref index 2727 namelen 13 name: dpkg.status.0

item 1 key (890550 EXTENT_DATA 0) itemoff 16207 itemsize 53

extent data disk byte 243952738304 nr 864256

extent data offset 0 nr 864256 ram 864256

extent compression 0

item 2 key (890551 INODE_ITEM 0) itemoff 16047 itemsize 160

inode generation 5386763 transid 5386764 size 209058 nbytes 212992

block group 0 mode 100644 links 1 uid 10 gid 10

rdev 0 flags 0x0

item 3 key (890551 INODE_REF 31762) itemoff 16021 itemsize 26

inode ref index 2726 namelen 16 name: dpkg.status.1.gz

item 4 key (890551 EXTENT_DATA 0) itemoff 15968 itemsize 53

extent data disk byte 243376005120 nr 212992

extent data offset 0 nr 212992 ram 212992

extent compression 0

item 5 key (890552 INODE_ITEM 0) itemoff 15808 itemsize 160

inode generation 5386763 transid 5386764 size 616 nbytes 616

block group 0 mode 100644 links 1 uid 10 gid 10

rdev 0 flags 0x0

item 6 key (890552 INODE_REF 31762) itemoff 15781 itemsize 27

inode ref index 2736 namelen 17 name: dpkg.diversions.0

item 7 key (890552 EXTENT_DATA 0) itemoff 15144 itemsize 637

inline extent data size 616 ram 616 compress 0

item 8 key (890553 INODE_ITEM 0) itemoff 14984 itemsize 160

inode generation 5386763 transid 5386764 size 248 nbytes 248

block group 0 mode 100644 links 1 uid 10 gid 10

rdev 0 flags 0x0

item 9 key (890553 INODE_REF 31762) itemoff 14954 itemsize 30

inode ref index 2735 namelen 20 name: dpkg.diversions.1.gz

item 10 key (890553 EXTENT_DATA 0) itemoff 14685 itemsize 269

inline extent data size 248 ram 248 compress 0

item 11 key (890554 INODE_ITEM 0) itemoff 14525 itemsize 160

inode generation 5386763 transid 5386764 size 135 nbytes 135

block group 0 mode 100644 links 1 uid 10 gid 10

rdev 0 flags 0x0

item 12 key (856762 INODE_REF 31762) itemoff 14496 itemsize 29

inode ref index 2745 namelen 19 name: dpkg.statoverride.0

item 13 key (890554 EXTENT_DATA 0) itemoff 14340 itemsize 156

inline extent data size 135 ram 135 compress 0

item 14 key (890555 INODE_ITEM 0) itemoff 14180 itemsize 160

inode generation 5386763 transid 5386764 size 129 nbytes 129

block group 0 mode 100644 links 1 uid 10 gid 10

rdev 0 flags 0x0

item 15 key (890555 INODE_REF 31762) itemoff 14148 itemsize 32

inode ref index 2744 namelen 22 name: dpkg.statoverride.1.gz

item 16 key (890555 EXTENT_DATA 0) itemoff 13998 itemsize 150

inline extent data size 129 ram 129 compress 0

item 17 key (890557 INODE_ITEM 0) itemoff 13838 itemsize 160

inode generation 5386763 transid 5386763 size 787062 nbytes 790528

block group 0 mode 100640 links 1 uid 100104 gid 14

rdev 0 flags 0x0

item 18 key (890557 INODE_REF 29289) itemoff 13817 itemsize 21

inode ref index 1372 namelen 11 name: syslog.2.gz

item 19 key (890557 EXTENT_DATA 0) itemoff 13764 itemsize 53

extent data disk byte 243948204032 nr 790528

extent data offset 0 nr 790528 ram 790528

extent compression 0

item 20 key (890558 INODE_ITEM 0) itemoff 13604 itemsize 160

inode generation 5386763 transid 5389981 size 4047291 nbytes 4050944

block group 0 mode 100640 links 1 uid 100104 gid 14

rdev 0 flags 0x0

item 21 key (890558 INODE_REF 29289) itemoff 13588 itemsize 16

inode ref index 1374 namelen 6 name: syslog

item 22 key (890558 EXTENT_DATA 0) itemoff 13535 itemsize 53

extent data disk byte 240840228864 nr 12288

extent data offset 0 nr 8192 ram 12288

extent compression 0

item 23 key (890558 EXTENT_DATA 8192) itemoff 13482 itemsize 53

extent data disk byte 240837672960 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 24 key (890558 EXTENT_DATA 12288) itemoff 13429 itemsize 20

extent data disk byte 240820076544 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 25 key (890558 EXTENT_DATA 16384) itemoff 13376 itemsize 53

extent data disk byte 240784408576 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 26 key (890558 EXTENT_DATA 20480) itemoff 13323 itemsize 53

extent data disk byte 240785170432 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 27 key (890558 EXTENT_DATA 24576) itemoff 13270 itemsize 53

extent data disk byte 242839834624 nr 36864

extent data offset 0 nr 32768 ram 36864

extent compression 0

item 28 key (890558 EXTENT_DATA 57344) itemoff 13217 itemsize 53

extent data disk byte 242836987904 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 29 key (890558 EXTENT_DATA 61440) itemoff 13164 itemsize 53

extent data disk byte 243400761344 nr 8192

extent data offset 0 nr 4096 ram 8192

extent compression 0

item 30 key (890558 EXTENT_DATA 65536) itemoff 13111 itemsize 53

extent data disk byte 243412983808 nr 8192

extent data offset 0 nr 4096 ram 8192


Re: BTRFS critical (device sda2): corrupt leaf, bad key order: block=293438636032, root=1, slot=11

2017-08-31 Thread Hugo Mills
On Thu, Aug 31, 2017 at 01:53:58PM -0400, Eric Wolf wrote:
> I'm having issues with a bad block(?) on my root ssd.
> 
> dmesg is consistently outputting "BTRFS critical (device sda2):
> corrupt leaf, bad key order: block=293438636032, root=1, slot=11"
> 
> "btrfs scrub stat /" outputs "scrub status for b2c9ff7b-[snip]-48a02cc4f508
> scrub started at Wed Aug 30 11:51:49 2017 and finished after 00:02:55
> total bytes scrubbed: 53.41GiB with 2 errors
> error details: verify=2
> corrected errors: 0, uncorrectable errors: 2, unverified errors: 0"
> 
> Running "btrfs check --repair /dev/sda2" from a live system stalls
> after telling me corrupt leaf etc etc then "11 12". CPU usage hits
> 100% and disk activity remains at 0.

   This error is usually attributable to bad hardware. Typically RAM,
but might also be marginal power regulation (blown capacitor
somewhere) or a slightly broken CPU.

   Can you show us the output of "btrfs-debug-tree -b 293438636032 /dev/sda2"?

   Hugo.

-- 
Hugo Mills | "You got very nice eyes, Deedee. Never noticed them
hugo@... carfax.org.uk | before. They real?"
http://carfax.org.uk/  |
PGP: E2AB1DE4  | Don Logan, Sexy Beast


signature.asc
Description: Digital signature