On 04/04/2018 12:57 AM, Zygo Blaxell wrote: >> I have to point out that in any case the extent is physically >> interrupted at the disk-stripe size. Assuming disk-stripe=64KB, if >> you want to write 128KB, the first half is written in the first disk, >> the other in the 2nd disk. If you want to write 96kb, the first 64 >> are written in the first disk, the last part in the 2nd, only on a >> different BG. > The "only on a different BG" part implies something expensive, either > a seek or a new erase page depending on the hardware. Without that, > nearby logical blocks are nearby physical blocks as well.
In any case it happens on a different disk > >> So yes there is a fragmentation from a logical point of view; from a >> physical point of view the data is spread on the disks in any case. > What matters is the extent-tree point of view. There is (currently) > no fragmentation there, even for RAID5/6. The extent tree is unaware > of RAID5/6 (to its peril). Before you pointed out that the non-contiguous block written has an impact on performance. I am replaying that the switching from a different BG happens at the stripe-disk boundary, so in any case the block is physically interrupted and switched to another disk However yes: from an extent-tree point of view there will be an increase of number extents, because the end of the writing is allocated to another BG (if the size is not stripe-boundary) > If an application does a loop writing 68K then fsync(), the multiple-BG > solution adds two seeks to read every 68K. That's expensive if sequential > read bandwidth is more scarce than free space. Why you talk about an additional seeks? In any case (even without the additional BG) the read happens from another disks >> * c),d),e) are applied only for the tail of the extent, in case the > size is less than the stripe size. > > It's only necessary to split an extent if there are no other writes > in the same transaction that could be combined with the extent tail > into a single RAID stripe. As long as everything in the RAID stripe > belongs to a single transaction, there is no write hole May be that a more "simpler" optimization would be close the transaction when the data reach the stripe boundary... But I suspect that it is not so simple to implement. > Not for d. Balance doesn't know how to get rid of unreachable blocks > in extents (it just moves the entire extent around) so after a balance > the writes would still be rounded up to the stripe size. Balance would > never be able to free the rounded-up space. That space would just be > gone until the file was overwritten, deleted, or defragged. If balance is capable to move the extent, why not place one near the other during a balance ? The goal is not to limit the the writing of the end of a extent, but avoid writing the end of an extent without further data (e.g. the gap to the stripe has to be filled in the same transaction) BR G.Baroncelli -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html