extent-tree.c:1833 on rebalance

Stéphane Lesimple Tue, 15 Sep 2015 14:48:13 -0700

Le 2015-09-15 16:56, Josef Bacik a écrit :

On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:

I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a newdrive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.

I've ran a scrub on this filesystem after the crash happened twice,and

if found no errors.

The BUG_ON() condition that my filesystem triggers is the following :

BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.

I've compiled a fresh 4.3.0-rc1 with a couple added printk's justbefore

the BUG_ON(), to dump the parameters passed to
insert_inline_extent_backref() when the problem occurs.
Here is an excerpt of the resulting dmesg :

{btrfs} in insert_inline_extent_backref, got owner <
BTRFS_FIRST_FREE_OBJECTID
{btrfs} with bytenr=4557830635520 num_bytes=16384 parent=4558111506432
root_objectid=3339 owner=1 offset=0 refs_to_add=1
BTRFS_FIRST_FREE_OBJECTID=256
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent-tree.c:1837!

I'll retry with the exact same kernel once I get the machine back up,
and see if the the bug happens again at the same filesystem spot or a
different one.
The variable amount of time after a balance start elapsed before I get
the bug suggests that this would be a different one.


Does btrfsck complain at all?


Thanks for your suggestion.
You're right, even if btrfs scrub didn't complain, btrfsck does :

checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary
[... some more ...]
extent buffer leak: start 4561066901504 len 16384
extent buffer leak: start 4561078812672 len 16384
extent buffer leak: start 4561078861824 len 16384
[... some more ...]
then some complains about mismatched counts for qgroups.

I can see from tbe btrfsck source code that the --repair will not workhere, so I didn't try.

I'm not sure if those errors would be a cause or a consequence of thebug. As the filesystem was only a few days old and as there was always abalance running during the crashes, I would be tempted to think it mightactually be a consequence, but I can't be sure.

In your experience, could these inconsistencies cause the crash ?

If you think so, then I'll btrfs dev del the 3rd device, then remountthe array degraded with just 1 disk and create a new btrfs system fromscratch on the second, then copy the data in single redundancy, thenre-add the 2 disks and balance convert in raid5.

If you think not, then this array could still help you debug a cornercase, and I can keep it that way for a couple days if more testing/debugis needed.


Thanks,

--
Stéphane
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

Reply via email to