On 2015-11-12 03:15, Qu Wenruo wrote: > Hi Anand, > > Nice work. > But I have some small questions about it. > > Anand Jain wrote on 2015/11/09 18:56 +0800: >> These set of patches provides btrfs hot spare and auto replace support >> for you review and comments. >> >> First, here below are the simple example steps to configure the same: >> >> Add a spare device: >> btrfs spare add /dev/sde -f > > I'm sorry but I didn't quite see the benefit of a spare device. > > Let's take the following example: > > 1) 2 RAID1 + 1 spare > (A + B) + C > > 2) 3 RAID1 > (A + B + C) > Let's assume they are all 12G size, and there are 3 raid1 chunks. > Each one is 3G size. > > In my understanding, in normal operation case: > > For case 1), all raid chunks should only be allocated into 2 RAID disks, > and spare one should contains no raid1 chunks. > > A B C > ------ ------ ------ > |free| |free| |free| > ------ ------ | | > |3Ga1| |3Ga2| | | > ------ ------ | | > |3Gb1| |3Gb2| | | > ------ ------ | | > |3Gc1| |3Gc2| | | > ------ ------ ------ > > > For case 2), all raid1 chunks will be allocated into all 3 disks, making the > allocation more fair. > A B C > ------ ------ ------ > |free| |free| |free| > ------ ------ ------ > |free| |free| |free| > ------ ------ ------ > |3Gb2| |3Ga1| |3Ga2| > ------ ------ ------ > |3Gc1| |3Gc2| |3Gb1| > ------ ------ ------ > > > At least in normal operation case, case 1) makes device C useless, and reduce > the total usable space. > > In disk B failure case: > > For case 1), we can auto replace B with C. > And it will copy all data chunks from A to C. > Need to copy 9G data. > > And after replace: > A B C > ------ ------ ------ > |free| | X | |free| > ------ ------ ------ > |3Ga1| | X |->|3Ga2| > ------ ------ ------ > |3Gb1| | X |->|3Gb2| > ------ ------ ------ > |3Gc1| | X |->|3Gc2| > ------ ------ ------ > > > > For case 2), we can just relocate and recover the bad chunks in B. > It it should only need to copy 6G data. > > And after the "recovery", it should be much the same as case 1): > A B C > ------ ------ ------ > |free| | X | |free| > ------ ------ ------ > |3Ga1|<\| X |/>|3Gc1| > ------ ------ ------ > |3Gb2| || X |/ |3Ga2| > ------ ------ ------ > |3Gc1| \| X | |3Gb1| > ------ ------ ------ > > > IIRC, the only benefit of a spare device is, we can ensure there is enough > space for a device place.(If the failing one is no larger than spare). > > But the cost is, increase in replace data copy and unfair chunk allocation. > > So I am not sure if the cost is good enough for the case. > At least, enhancing the chunk relocation to fulfill the case 2) will bring a > much smaller code base. > > Thanks, > Qu
Interesting analysis. Another difference between the two scenarios, is that in the first case (A+B+spare) is that the spare doesn't work until it is needed: less power consumption and when needed you are using a new disk instead of an used one. >> >> OR if there is a spare device which is already added before the, just >> run >> >> btrfs dev scan [/dev/sde] >> >> this will register the spare device to the kernel. >> >> btrfs fi show >> Label: none uuid: 52f170c1-725c-457d-8cfd-d57090460091 >> Total devices 2 FS bytes used 112.00KiB >> devid 1 size 2.00GiB used 417.50MiB path /dev/sdc >> devid 2 size 2.00GiB used 417.50MiB path /dev/sdd >> >> Global spare >> device size 3.00GiB path /dev/sde >> >> Thats it. >> >> Auto replace: >> Replace happens automatically, that is when there is any write >> failed or flush failed, the device will be marked as failed, which >> will stop any further IO attempt to that device. And in the next commit >> thread cycle the auto replace will pick the spare device (/dev/sde is >> above example) to replace the failed device. And so the btrfs volume is >> back to a healthy state. >> >> >> Its btrfs Global spare: >> as of now only global hot spare is supported, that is hot spare(s) >> are for all the btrfs FS in the system. >> >> No spare when device failed: >> It would scan for spare device at the rate of transaction commit >> and will trigger the auto replace when ever spare device is added. >> >> Priority: >> In some future work there can be some chronological order to pick >> a spare and the failed device. >> >> >> Patches: >> >> Kernel: >> First, it needs, Qu's per chunk missing device patchset, >> which is part of the set here and also there is a light optimization >> (patch 5/15) which was required as part of this enhancement. >> >> Next patches 7,8/15 brings in support, to manage the transition of >> devices from online (no state) to offline OR failed state dynamically. >> On top of static device state like the current "missing" state. >> >> Patch 9/15 fixes a bug where in we should have blocked the incompatible >> feature at the device scan/add level instead/also at in the mount level. >> This is because we don't have to bring a device into the device list, >> if it is incompatible. >> >> Next patches 10,11,12,13/15 adds support for Spare device. For the >> details on how to add a spare device kindly see further below. >> For kernel with out spare feature supported the spare device >> is kept away. And when the kernel supports the spare device, it will >> inhibit from mounting it. Further these patch set provides helper >> function to pick a spare device and release a spare device back to >> the spare device pool. >> >> Patch 14/15 provides function for auto replace, this is mainly >> from the existing replace code, and in the long run I see opportunity >> to merge these code with the replace code that is triggered from >> the user spare. >> >> Last 15/15, uses all these facilities, picks a failed device and >> triggers a auto replace in a kthread (casualty_kthread()) >> >> >> Progs: >> Would need 4 patches as listed below. >> >> >> Known Bug: >> >> As now I see below stale kmem cache during module unload. Which >> I am digging. >> ------ >> BUG btrfs_path (Not tainted): Objects remaining in btrfs_path on >> kmem_cache_close() >> ------ >> >> Anand Jain (10): >> btrfs: optimize btrfs_check_degradable() for calls outside of barrier >> btrfs: introduce device dynamic state transition to offline or failed >> btrfs: check device for critical errors and mark failed >> btrfs: block incompatible optional features at scan >> btrfs: introduce BTRFS_FEATURE_INCOMPAT_SPARE_DEV >> btrfs: add check not to mount a spare device >> btrfs: support btrfs dev scan for spare device >> btrfs: provide framework to get and put a spare device >> btrfs: introduce helper functions to perform hot replace >> btrfs: check for failed device and hot replace >> >> Qu Wenruo (5): >> btrfs: Introduce a new function to check if all chunks a OK for >> degraded mount >> btrfs: Do per-chunk check for mount time check >> btrfs: Do per-chunk degraded check for remount >> btrfs: Allow barrier_all_devices to do per-chunk device check >> btrfs: Cleanup num_tolerated_disk_barrier_failures >> >> fs/btrfs/ctree.h | 7 +- >> fs/btrfs/dev-replace.c | 116 ++++++++++++++++++++ >> fs/btrfs/dev-replace.h | 1 + >> fs/btrfs/disk-io.c | 211 +++++++++++++++++++++++------------- >> fs/btrfs/disk-io.h | 2 - >> fs/btrfs/super.c | 20 +++- >> fs/btrfs/transaction.c | 3 +- >> fs/btrfs/volumes.c | 283 >> ++++++++++++++++++++++++++++++++++++++++++++++--- >> fs/btrfs/volumes.h | 27 +++++ >> 9 files changed, 571 insertions(+), 99 deletions(-) >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to [email protected] > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html
