Am Tue, 7 Feb 2017 10:43:11 -0500 schrieb "Austin S. Hemmelgarn" <ahferro...@gmail.com>:
> > I mean that: > > You have a 128MB extent, you rewrite random 4k sectors, btrfs will > > not split 128MB extent, and not free up data, (i don't know > > internal algo, so i can't predict when this will hapen), and after > > some time, btrfs will rebuild extents, and split 128 MB exten to > > several more smaller. But when you use compression, allocator > > rebuilding extents much early (i think, it's because btrfs also > > operates with that like 128kb extent, even if it's a continuos > > 128MB chunk of data). > The allocator has absolutely nothing to do with this, it's a function > of the COW operation. Unless you're using nodatacow, that 128MB > extent will get split the moment the data hits the storage device > (either on the next commit cycle (at most 30 seconds with the default > commit cycle), or when fdatasync is called, whichever is sooner). In > the case of compression, it's still one extent (although on disk it > will be less than 128MB) and will be split at _exactly_ the same time > under _exactly_ the same circumstances as an uncompressed extent. > IOW, it has absolutely nothing to do with the extent handling either. I don't think that btrfs splits extents which are part of the snapshot. The extent in a snapshot will stay intact when writing to this extent in another snapshot. Of course, in the just written snapshot, the extent will be represented as a split extent mapping to the original extents data blocks plus the new data in the middle (thus resulting in three extents). This is also why small random writes without autodefrag result in a vast amount of small extents bringing the fs performance to a crawl. Do that multiple times on multiple snapshots, delete some of the original snapshots, and you're left with slack space, data blocks being inaccessible and won't be reclaimed into free space (because they are still part of the original extent), and which can only be reclaimed by a defrag operation - which would of course unshares data. Thus, if any of the above mentioned small extents is still shared with an extent originally much bigger, then it will still occupy its original space on the filesystem - even when its associated snapshot/subvolume no longer exists. Only when the last remaining tiny block of such an extent gets rewritten and the reference counter decreases to zero, the extent is given up and freed. To work around this, you can currently only unshare and recombine by doing defrag and dedupe on all snapshots. This will reclaim space sitting in parts of the original extents no longer referenced by a snapshot visible from the VFS layer. This is for performance reasons because btrfs is extent based. As far as I know, ZFS on the other side, works different. It uses block based storage for the snapshot feature and can easily throw away unused blocks. Only a second layer on top maps this back into extents. The underlying infrastructure, however, is block based storage, which also enables the volume pool to create block devices on the fly out of ZFS storage space. PS: All above given the fact I understood it right. ;-) -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html