On Wed, Sep 02, 2015 at 12:29:06PM +0200, Christian Rohmann wrote: > Hello btrfs-enthusiasts, > > I have a rather big btrfs RAID6 with currently 12 devices. It used to be > only 8 drives 4TB each, but I successfully added 4 more drives with 1TB > each at some point. What I am trying to find out, and that's my main > reason for posting this, is how to balance the data on the drives now. > > I am wondering what I should read from this "btrfs filesystem show" output: > > --- cut --- > Total devices 12 FS bytes used 19.23TiB > devid 1 size 3.64TiB used 3.64TiB path /dev/sdc > devid 2 size 3.64TiB used 3.64TiB path /dev/sdd > devid 3 size 3.64TiB used 3.64TiB path /dev/sde > devid 4 size 3.64TiB used 3.64TiB path /dev/sdf > devid 5 size 3.64TiB used 3.64TiB path /dev/sdh > devid 6 size 3.64TiB used 3.64TiB path /dev/sdi > devid 7 size 3.64TiB used 3.64TiB path /dev/sdj > devid 8 size 3.64TiB used 3.64TiB path /dev/sdb > devid 9 size 931.00GiB used 535.48GiB path /dev/sdg > devid 10 size 931.00GiB used 535.48GiB path /dev/sdk > devid 11 size 931.00GiB used 535.48GiB path /dev/sdl > devid 12 size 931.00GiB used 535.48GiB path /dev/sdm
You had some data on the first 8 drives with 6 data+2 parity, then added four more. From that point on, you were adding block groups with 10 data+2 parity. At some point, the first 8 drives became full, and then new block groups have been added only to the new drives, using 2 data+2 parity. > btrfs-progs v4.1.2 > --- cut --- > > > First of all I wonder why the first 8 disks are shown as "full" as "used > = size", but there is 5.3TB of free space for the fs shown by "df": > > --- cut --- > Filesystem Size Used Avail Use% Mounted on > /dev/sdc 33T 20T 5.3T 79% /somemountpointsomewhere > --- cut --- This is inaccurate because the calculations that correct for the RAID usage probably aren't all that precise for parity RAID, particularly when there's variable stripe sizes like you have in your FS. In fact, they're not even all that good for things like RAID-1 (I've seen inaccuracies on my own RAID-1 system). > Also "btrfs filesystem df" doesn't give me any clues on the matter: > > --- cut --- > btrfs filesystem df /srv/mirror/ > Data, single: total=8.00MiB, used=0.00B > Data, RAID6: total=22.85TiB, used=19.19TiB > System, single: total=4.00MiB, used=0.00B > System, RAID6: total=12.00MiB, used=1.34MiB > Metadata, single: total=8.00MiB, used=0.00B > Metadata, RAID6: total=42.09GiB, used=38.42GiB > GlobalReserve, single: total=512.00MiB, used=1.58MiB > --- cut --- This is showing you how the "used" space from the btrfs fi show output is divided up. It won't tell you anything about the proportion of the data that's 6+2, the amount that's 10+2, and the amount that's 2+2 (or any other values). > What I am very certain about is that the "load" of I/O requests is not > equal yet, as iostat clearly shows: > > --- cut --- > Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s > avgrq-sz avgqu-sz await r_await w_await svctm %util > sdc 21.40 4.41 42.22 12.71 3626.12 940.79 > 166.29 3.82 69.38 42.83 157.60 5.98 32.82 > sdb 22.35 4.45 41.29 12.71 3624.20 941.27 > 169.09 4.22 77.88 46.75 178.97 6.10 32.96 > sdd 22.03 4.44 41.60 12.73 3623.76 943.22 > 168.13 3.79 69.45 42.53 157.48 6.05 32.85 > sde 21.21 4.43 42.30 12.74 3621.39 943.36 > 165.88 3.82 69.28 42.99 156.62 5.98 32.90 > sdf 22.19 4.42 41.42 12.75 3623.65 940.63 > 168.51 3.77 69.36 42.64 156.13 6.05 32.79 > sdh 21.35 4.46 42.25 12.68 3623.12 940.28 > 166.14 3.95 71.72 43.61 165.40 6.02 33.06 > sdi 21.92 4.38 41.67 12.79 3622.03 942.91 > 167.63 3.49 63.83 40.23 140.74 6.02 32.77 > sdj 21.31 4.41 42.26 12.72 3625.32 941.50 > 166.12 3.99 72.25 44.50 164.44 6.00 33.01 > sdg 8.90 4.97 12.53 21.16 1284.47 1630.08 > 173.02 0.83 24.61 27.31 23.02 1.77 5.95 > sdk 9.14 4.94 12.30 21.19 1284.61 1630.02 > 174.07 0.79 23.41 26.59 21.57 1.76 5.91 > sdl 8.88 4.95 12.58 21.19 1284.46 1630.06 > 172.62 0.80 23.80 25.68 22.68 1.78 6.00 > sdm 9.07 4.85 12.35 21.29 1284.43 1630.01 > 173.26 0.79 23.57 26.57 21.83 1.77 5.94 > > --- cut --- > > > > Should I run btrfs balance on the filesystem? If so, what FILTERS would > I then use in order for the data and therefore requests to be better > distributed? Yes, you should run a balance. You probably need to free up some space on the first 8 drives first, to give the allocator a chance to use all 12 devices in a single stripe. This can also be done with a balance. Sadly, with the striped RAID levels (0, 10, 5, 6), it's generally harder to ensure that all of the data is striped as evenly as is possible(*). I don't think there are any filters that you should to use -- just balance everything. The first time probably won't do the job fully. A second balance probably will. These are going to take a very long time to run (in your case, I'd guess at least a week for each balance). I would recommend starting the balance in a tmux or screen session, and also creating a second shell in the same session to run monitoring processes. I typically use something like: watch -n60 sudo btrfs fi show\; echo\; btrfs fi df /mountpoint\; echo\; btrfs bal stat /mountpoint Hugo. (*) Hmmm... idea for a new filter: min/max stripe width? Then you could balance only the block groups that aren't at full width, which is probably what's needed here. -- Hugo Mills | Comic Sans goes into a bar, and the barman says, "We hugo@... carfax.org.uk | don't serve your type here." http://carfax.org.uk/ | PGP: E2AB1DE4 |
signature.asc
Description: Digital signature