On 07/04/2021 19:54, Josef Bacik wrote: > On 3/15/21 1:53 AM, Naohiro Aota wrote: >> This commit moves the location of superblock logging zones. The location of >> the logging zones are determined based on fixed block addresses instead of >> on fixed zone numbers. >> >> By locating the superblock zones using fixed addresses, we can scan a >> dumped file system image without the zone information. And, no drawbacks >> exist. >> >> We use the following three pairs of zones containing fixed offset >> locations, regardless of the device zone size. >> >> - Primary superblock: zone starting at offset 0 and the following zone >> - First copy: zone containing offset 64GB and the following zone >> - Second copy: zone containing offset 256GB and the following zone >> >> If the location of the zones are outside of disk, we don't record the >> superblock copy. >> >> These addresses are arbitrary, but using addresses that are too large >> reduces superblock reliability for smaller devices, so we do not want to >> exceed 1T to cover all case nicely. >> >> Also, LBAs are generally distributed initially across one head (platter >> side) up to one or more zones, then go on the next head backward (the other >> side of the same platter), and on to the following head/platter. Thus using >> non sequential fixed addresses for superblock logging, such as 0/64G/256G, >> likely result in each superblock copy being on a different head/platter >> which improves chances of recovery in case of superblock read error. >> >> These zones are reserved for superblock logging and never used for data or >> metadata blocks. Zones containing the offsets used to store superblocks in >> a regular btrfs volume (no zoned case) are also reserved to avoid >> confusion. >> >> Note that we only reserve the 2 zones per primary/copy actually used for >> superblock logging. We don't reserve the ranges possibly containing >> superblock with the largest supported zone size (0-16GB, 64G-80GB, >> 256G-272GB). >> >> The first copy position is much larger than for a regular btrfs volume >> (64M). This increase is to avoid overlapping with the log zones for the >> primary superblock. This higher location is arbitrary but allows supporting >> devices with very large zone size, up to 32GB. But we only allow zone sizes >> up to 8GB for now. >> > > Ok it took me a few reads to figure out what's going on. > > The problem is that with large zone sizes, our current choices put the back > up > super blocks waaaayyyyyy out on the disk, correct? So instead you've picked > arbitrary byte offsets, hoping that they'll be closer to the front of the > disk > and thus actually be useful. > > And then you've introduced the 8gib zone size as a way to avoid problems > where > we get the same zone for the backup supers. > > Are these statements correct? If so the changelog should be updated to make > this clear up front, because it took me a while to work that out.
No the problem is, we're placing superblocks into specific zones, regardless of the zone size. This creates a problem when you need to inspect a file system, but don't have the block device available, because you can't look at the zone size to calculate where the superblocks are on the device. With this change we're placing the superblocks not into specific zone numbers, but into the zones starting at specific offsets. We're taking 8G zone size as a maximum expected zone size, to make sure we're not overlapping superblock zones. Currently SMR disks have a zone size of 256MB and we're expecting ZNS drives to be in the 1-2GB range, so this 8GB gives us room to breath. Hope this helps clearing up any confusion. Byte, Johannes