Re: "No space left on device" and balance doesn't work

Austin S. Hemmelgarn Thu, 02 Jun 2016 05:57:08 -0700

On 2016-06-01 14:30, MegaBrutal wrote:

Hi all,


I have a 20 GB file system and df says I have about 2,6 GB free space,
yet I can't do anything on the file system because I get "No space
left on device" errors. I read that balance may help to remedy the
situation, but it actually doesn't.


Some data about the FS:


root@ReThinkCentre:~# df -h /
Fájlrendszer                Méret Fogl. Szab. Fo.% Csatol. pont
/dev/mapper/centrevg-rootlv   20G   18G  2,6G  88% /

root@ReThinkCentre:~# btrfs fi show /
Label: 'RootFS'  uuid: 3f002b8d-8a1f-41df-ad05-e3c91d7603fb
        Total devices 1 FS bytes used 15.42GiB
        devid    1 size 20.00GiB used 20.00GiB path /dev/mapper/centrevg-rootlv

root@ReThinkCentre:~# btrfs fi df /
Data, single: total=16.69GiB, used=14.14GiB
System, DUP: total=32.00MiB, used=16.00KiB
Metadata, DUP: total=1.62GiB, used=1.28GiB
GlobalReserve, single: total=352.00MiB, used=0.00B

root@ReThinkCentre:~# btrfs version
btrfs-progs v4.4


This happens when I try to balance:

root@ReThinkCentre:~# btrfs fi balance start -dusage=66 /
Done, had to relocate 0 out of 33 chunks
root@ReThinkCentre:~# btrfs fi balance start -dusage=67 /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail


"dmesg | tail" does not show anything related to this.

It is important to note that the file system currently has 32
snapshots of / at the moment, and snapshots taking up all the free
space is a plausible explanation. Maybe deleting some of the oldest
snapshots or just increasing the file system would help the situation.
However, I'm still interested, if the file system is full, why does df
show there is free space, and how could I show the situation without
having the mentioned options? I actually have an alert set up which
triggers when the FS usage reaches 90%, so then I know I have to
delete some old snapshots. It worked so far, I cleaned the snapshots
at 90%, FS usage fell back, everyone was happy. But now the alert
didn't even trigger because the FS is at 88% usage, so it shouldn't be
full yet.

The first thing that needs to be understood is that df has been prettymuch unchanged since it was introduced in the 70's (IIRC, it was in atleast SVR4, possibly earlier UNIX versions too). Back then, it waspretty easy to say what percentage of space was used and how much isleft. Back then, a filesystem only allocated one set of blocks for afile, and it didn't need extra space for updates, and the file took upexactly as much space as it's size on disk (usually, it can get kind ofcomplicated based on a number of factors). In addition, traditional UFShad a fixed size metadata area for the inodes, which simplifiedcomputations even more.

In BTRFS though, almost all of these assumptions which the originalinterface made aren't guaranteed.

Now, the biggest difference though is in how BTRFS allocates space.BTRFS uses a two tier allocation system. First, you have high-levelallocations of what are usually referred to as chunks, and then itallocates blocks within those chunks. The balance operation operates atthe chunk level, whereas things like defragmentation operate at theblock level. For performance reasons, BTRFS usually has separate chunksfor metadata and data. Data chunks are usually 1GB, and metadata chunksare usually 256MB, although both can vary in size based on the size ofthe filesystem. Figuring out the exact size gets tricky on a livefilesystem, but if your filesystem is between 16G and 64G, you're prettymuch guaranteed to have chunks which are the default size.

Now, because of the segregation of data and metadata, and how chunkallocation works, it's possible to end up in a situation where youtechnically have free space, but you can't actually do anything with it.This is because most file operations on BTRFS require at least a fewblocks of metadata space so that the COW updates can happen. Youluckily don't appear to be quite to that point.

For compatibility reasons, we have to report _something_ through df. Wecan't however report many of the situational things about the state ofthe FS itself (for example, if you have all the possible chunksallocated, no space in data chunks, but free space in metadata chunks,it's possible to create a lot of very small files, but creating a bigone will fail). As a result of this, what we report through df istechnically absolutely correct (in your case, you _do_ technically have2.6G of free space), but is also absolutely useless for any kind ofmanagement decision.

In your particular situation, what's happened is that you have all thespace allocated to chunks, but have free space within those chunks.Balance never puts data in existing chunks, and you can't allocate anynew chunks, so you can't run a balance. However, because of that freespace in the chunks, you can still use the filesystem itself for'regular' filesystem operations.

In this situation, Henk's suggestion of adding another device is one ofthree options for dealing with this. The other two options (which areusually less practical for most people) are to resize the filesystem tohave more space, or recreate it from scratch.

As far as avoiding this in the future, the best option is to keep an eyeon the output of fi show, and keep the per-device 'used' value at leasta few GB below the device size. I usually go for about 2GB or 0.2% ofthe device size, whichever is bigger. This will give you enoughheadroom for at least a few chunks to be allocated so that balance canproceed.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: "No space left on device" and balance doesn't work

Reply via email to