Re: Scrub aborts due to corrupt leaf

Holger Hoffstätte Wed, 10 Oct 2018 11:25:51 -0700

On 10/10/18 19:44, Chris Murphy wrote:

On Wed, Oct 10, 2018 at 10:04 AM, Holger Hoffstätte
<hol...@applied-asynchrony.com> wrote:

On 10/10/18 17:44, Larkin Lowrey wrote:
(..)


About once a week, or so, I'm running into the above situation where
FS seems to deadlock. All IO to the FS blocks, there is no IO
activity at all. I have to hard reboot the system to recover. There
are no error indications except for the following which occurs well
before the FS freezes up:

BTRFS warning (device dm-3): block group 78691883286528 has wrong amount
of free space
BTRFS warning (device dm-3): failed to load free space cache for block
group 78691883286528, rebuilding it now

Do I have any options other the nuking the FS and starting over?



Unmount cleanly & mount again with -o space_cache=v2.


I'm pretty sure you have to umount, and then clear the space_cache
with 'btrfs check --clear-space-cache=v1' and then do a one time mount
with -o space_cache=v2.

But anyway, to me that seems premature because we don't even know
what's causing the problem.


Space cache writeout not honoring errors from the depths below
is not unusual, I think there were some fixes recently which Larkin
likely doesn't have yet. But yeah, I forgot to mention that cache-v2
alone won't really fix the _underlying_ symptoms. It is, however,
vastly more reliable in general.

a. Freezing means there's a kernel bug. Hands down.
b. Is it freezing on the rebuild? Or something else?
c. I think the devs would like to see the output from btrfs-progs
v4.17.1, 'btrfs check --mode=lowmem' and see if it finds anything, in
particular something not related to free space cache.


Apart from performance implications, if only the free space cache
inodes/blocks are borked then the rest will (should) work just fine
and/or be replaced/overwritten eventually.

Well, at least that was the idea. :}

-h

Re: Scrub aborts due to corrupt leaf

Reply via email to