Re: Transaction aborted error -28 clone_finish_inode_update

Duncan Fri, 05 Oct 2018 03:21:02 -0700

David Goodwin posted on Thu, 04 Oct 2018 17:44:46 +0100 as excerpted:

> While trying to run/use bedup ( https://github.com/g2p/bedup ) .... I
> hit this :
> 
> 
> [Thu Oct  4 15:34:51 2018] ------------[ cut here ]------------
> [Thu Oct  4 15:34:51 2018] BTRFS: Transaction aborted (error -28)
> [Thu Oct  4 15:34:51 2018] WARNING: CPU: 0 PID: 28832 at
> fs/btrfs/ioctl.c:3671 clone_finish_inode_update+0xf3/0x140


> [Thu Oct  4 15:34:51 2018] CPU: 0 PID: 28832 Comm: bedup Not tainted
> 4.18.10-psi-dg1 #1

[snipping a bunch of stuff that I as a non-dev list regular can't do much 
with anyway]

> [Thu Oct  4 15:34:51 2018] BTRFS: error (device xvdg) in
> clone_finish_inode_update:3671: errno=-28 No space left
> [Thu Oct  4 15:34:51 2018] BTRFS info (device xvdg): forced readonly 

> % btrfs fi us /filesystem/
> Overall:
>      Device size:           7.12TiB
>      Device allocated:      6.80TiB
>      Device unallocated:    330.93GiB
>      Device missing:        0.00B
>      Used:                  6.51TiB
>      Free (estimated):      629.87GiB    (min: 629.87GiB)
>      Data ratio:            1.00
>      Metadata ratio:        1.00
>      Global reserve:        512.00MiB    (used: 0.00B)
> 
> Data+Metadata,single: Size:6.80TiB, Used:6.51TiB
>     /dev/xvdf       1.69TiB
>     /dev/xvdg       3.12TiB
>     /dev/xvdi       1.99TiB
> 
> System,single: Size:32.00MiB, Used:780.00KiB
>     /dev/xvdf      32.00MiB
> 
> Unallocated:
>     /dev/xvdf     320.97GiB
>     /dev/xvdg     949.00MiB
>     /dev/xvdi       9.03GiB
> 
> 
> I kind of think there is sufficient free space..... at least globally
> within the filesystem.
> 
> Does it require balancing to redistribute the unallocated space better?
> Or is something misbehaving?

The latter, but unfortunately there's not much you can do about it at 
this point but wait for fixes, unless you want to spit up that huge 
filesystem into several smaller ones.

In general, btrfs has at least four kinds of "space" that it can run out 
of, tho in your case it appears you're running mixed-mode so data and 
metadata space are combined into one.

* Unallocated space:  This is space that remains entirely unallocated in 
the filesystem.  It matters most when the balance between data and 
metadata space gets off.

This isn't a problem for you as in single mode space can be allocated 
from any device and you have one with hundreds of gigs unallocated.  It 
also tends to be less of a problem on mixed-bg mode, which you're 
running, as there's no distinction in mixed-mode between data and 
metadata.

* Data chunk space:
* Metadata chunk space:

Because you're running mixed-bg mode, there's no distinction between 
these two, but for normal mode, running out of one or the other while all 
the free space is allocated to chunks of the other type, can be a problem.

* Global reserve:  Taken from metadata, the global reserve is space the 
system won't normally use, that it tries to keep clear in ordered to be 
able to finish transactions once they're started, as btrfs' copy-on-write 
semantics means even deleting stuff requires a bit of additional space 
temporarily.

This seems to actually be where the problem is, because currently, 
certain btrfs operations such as reflinking/cloning/snapshotting (that 
is, just what you were doing) don't really calculate the needed space 
correctly and use arbitrary figures, which can be *wildly* off, while 
conversely a bare half-gig of global-reserve for a huge 7+ TiB filesystem 
seems rather proportionally small.  (Consider that my small pair-device 
btrfs raid1 root filesystem, 8-GiB/device, 16 GiB total, has a 16 MiB 
reserve, proportionally, your 7+ TB filesystem would have 7+ GiB reserve, 
but it only has a half GiB.)

So relatively small btrfs' don't tend to run into the problem, because 
they have proportionally larger reserves to begin with.  Plus they 
probably don't have proportionally as many snapshots/reflinks/etc, 
either, so the problem simply doesn't trigger for them.

Now I'm not a dev and my own use-case doesn't include either snapshotting 
or deduping, so I haven't paid that much attention to the specifics, but 
I have seen some recent patches on-list that based on the explanations 
should go some way toward fixing this problem by using more realistic 
figures for global-reserve calculations.  At this point those patches 
would be for 4.20 (which might be 5.0), or possibly 4.21, but the devs 
are indeed working on the problem and it should get better within a 
couple kernel cycles.

Alternatively perhaps the global reserve size could be bumped up on such 
large filesystems, but let's see if the more realistic operations-reserve 
calculations can fix things, first, as arguably that shouldn't be 
necessary once the calculations aren't so arbitrarily wild.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

Re: Transaction aborted error -28 clone_finish_inode_update

Reply via email to