Re: Scrub aborts due to corrupt leaf

Larkin Lowrey Wed, 10 Oct 2018 11:31:28 -0700

On 10/10/2018 2:20 PM, Holger Hoffstätte wrote:

On 10/10/18 19:25, Larkin Lowrey wrote:

On 10/10/2018 12:04 PM, Holger Hoffstätte wrote:

On 10/10/18 17:44, Larkin Lowrey wrote:
(..)

About once a week, or so, I'm running into the above situation where
FS seems to deadlock. All IO to the FS blocks, there is no IO
activity at all. I have to hard reboot the system to recover. There
are no error indications except for the following which occurs well
before the FS freezes up:
BTRFS warning (device dm-3): block group 78691883286528 has wrongamount of free spaceBTRFS warning (device dm-3): failed to load free space cache forblock group 78691883286528, rebuilding it now
Do I have any options other the nuking the FS and starting over?


Unmount cleanly & mount again with -o space_cache=v2.


It froze while unmounting. The attached zip is a stack dump captured
via 'echo t > /proc/sysrq-trigger'. A second attempt after a hard
reboot worked.


Trace says freespace cache writeout failed midway while the scsi device
was resetting itself and then went aaaarrrghh. Probably managed to hit
different blocks on the second attempt. So chances are your controller,
disk or something else is broken, dying, or both.
When things have settled and you have verified that r/o mounting works
and is stable, try rescuing the data (when necessary) before scrubbing,
dm-device-checking or whatever you have set up.

Interesting, because I do not see any indications of any other errors.The fs is backed by an mdraid array and the raid checks always pass withno mismatches, edac-util doesn't report any ECC errors, smartd doesn'treport any SMART errors, and I never see any raid controller errors. Ihave the console connected through serial to a logging console server soif there were errors reported I would have seen them.


--Larkin

Re: Scrub aborts due to corrupt leaf

Reply via email to