On 2019/4/24 下午5:28, Filipe Manana wrote: [snip] >>> So what's wrong with it? And how does it cause the ENOSPC? >> >> E.g. >> >> We have a 128Mb preallocated file extent. >> And assume the fs only have 128M free data space, meaning 0 remaining >> space at all. > > That's a contradicting sentence... > >> >> Then we try to buffer write, which means buffered will just fail as it >> will need data space. >> >> The idea is always here for fallocate/pwrite, just the timing where the >> ENOSPC happens. > > Can't make sense of that sentence as well.
My bad, that change is already in buffered_write(), so that sentence makes no sense. > > So I suppose what you are trying to say is that a write into an > unwritten extent causes space allocation, > and that can prevent some other write (which is not into an unwritten > extent) from being able to allocate space and therefore fail. That's one case. > > That's a valid problem that should be temporary. I just tried a basic script: --- #!/bin/bash dev=/dev/test/test mnt=/mnt/btrfs mkfs.btrfs -f $dev -b 512M mount $dev $mnt fallocate -l 384M $mnt/file1 echo "fallocate success" dd if=/dev/zero bs=512K conv=notrunc count=768 of=$mnt/file2 umount $mnt --- This fails just like the error report. At least in current form, if we're writing into the preallocated space, it indeed skips the data space reservation so it shouldn't cause problem at that buffered write in theory. However we have other locations which can reserve data space: - btrfs_page_mkwrite() - btrfs_truncate_block() - btrfs_direct_IO() Haven't looked into why above script fails, but it should have something to do with any of the data space reservation. Thanks, Qu > > However when allocating space for a write into an unwritten extent (or > any nodatacow write) we increment the data space info's bytes_may_use > counter, > but then if when writeback starts if we don't need to fallback into > CoW, we end up never decrementing the bytes_may_use counter (even > after writeback completes), leaking it. > Not sure if this is the problem you were mentioning or just causing > other writes to temporarily fail. > > thanks > > >> >> >> We have btrfs/153 for the same reason to fail for a long time, although >> it's from quota, but the reason the completely the same. >> >> Thanks, >> Qu >> >>> >>> Trying the reproducer, at least on a 5.0 kernel, does never fail on a >>> pwrite for me, but always on fallocate: >>> >>> $ mkfs.btrfs -f -b $((4 * 1024 * 1024 * 1024)) /dev/sdi >>> $ mount /dev/sdi /mnt/sdi >>> $ cd /mnt/sdi >>> $ /path/to/reproducer >>> reading from /dev/urandom >>> writing to ./blob.IIa6tH >>> writing blocks of 132096 bytes each >>> total 125 MiB, 65.52 MiB/s >>> total 251 MiB, 44.59 MiB/s >>> total 377 MiB, 55.23 MiB/s >>> total 503 MiB, 66.21 MiB/s >>> total 629 MiB, 59.97 MiB/s >>> total 755 MiB, 3.70 MiB/s >>> total 881 MiB, 50.24 MiB/s >>> total 1007 MiB, 64.51 MiB/s >>> total 1133 MiB, 50.70 MiB/s >>> total 1259 MiB, 49.29 MiB/s >>> total 1385 MiB, 47.93 MiB/s >>> total 1511 MiB, 4.00 MiB/s >>> total 1637 MiB, 49.85 MiB/s >>> total 1763 MiB, 48.11 MiB/s >>> total 1889 MiB, 66.62 MiB/s >>> total 2015 MiB, 5.60 MiB/s >>> total 2141 MiB, 19.58 MiB/s >>> total 2267 MiB, 64.80 MiB/s >>> total 2393 MiB, 13.23 MiB/s >>> total 2519 MiB, 14.95 MiB/s >>> fallocate failed: No space left on device >>> >>> So either that was tested on a rather old kernel or: >>> >>> 1) we had snapshotting happening between a fallocate and a pwrite (or >>> at the same time as the pwrite) >>> 2) before the pwrite (or during) the unwritten/prealloc extent was >>> reflinked (cp --reflink, clone or dedupe ioctls) >>> >>> What did I miss here? >>> >>> Thanks. >>> >>>> >>>> E.g. reserved space underflow. >>>> >>>> I'll find the old thread and retry again. >>>> >>>> Thanks, >>>> Qu >>>> >>>>> This seems to break the semantics of fallocate so the performance should >>>>> not the main concern here. >>>>> >>>> >>> >>> >> > >
signature.asc
Description: OpenPGP digital signature
