>> Consider the common case of a 3-member volume with a 'raid1' >> target profile: if the sysadm thinks that a drive should be >> replaced, the goal is to take it out *without* converting every >> chunk to 'single', because with 2-out-of-3 devices half of the >> chunks will still be fully mirrored.
>> Also, removing the device to be replaced should really not be >> the same thing as balancing the chunks, if there is space, to be >> 'raid1' across remaining drives, because that's a completely >> different operation. > There is a command specifically for replacing devices. It > operates very differently from the add+delete or delete+add > sequences. [ ... ] Perhaps it was not clear that I was talking about removing a device, as distinct from replacing it, and that I used "removed" instead of "deleted" deliberately, to avoid the confusion with the 'delete' command. In the everyday practice of system administration it often happens that a device should be removed first, and replaced later, for example when it is suspected to be faulty, or is intermittently faulty. The replacement can be done with 'replace' or 'add+delete' or 'delete+add', but that's a different matter. Perhaps I should have not have used the generic verb "remove", but written "make unavailable". This brings about again the topic of some "confusion" in the design of the Btrfs multidevice handling logic, where at least initially one could only expand the storage space of a multidevice by 'add' of a new device or shrink the storage space by 'delete' of an existing one, but I think it was not conceived at Btrfs design time of storage space being nominally constant but for a device (and the chunks on it) having a state of "available" ("present", "online", "enabled") or "unavailable" ("absent", "offline", "disabled"), either because of events or because of system administrator action. The 'missing' pseudo-device designator was added later, and 'replace' also later to avoid having to first expand then shrink (or viceversa) the storage space and the related copying. My impression is that it would be less "confused" if the Btrfs device handling logic were changed to allow for the the state of "member of the multidevice set but not actually available" and the related consequent state for chunks that ought to be on it; that probably would be essential to fixing the confusing current aspects of recovery in a multidevice set. That would be very useful even if it may require a change in the on-disk format to distinguish the distinct states of membership and availability for devices and mark chunks as available or not (chunks of course being only possible on member devices). That is, it would also be nice to have the opposite state of "not member of the multidevice set but actually available to it", that is a spare device, and related logic. Note: simply setting '/sys/block/$DEV/device/delete' is not a good option, because that makes the device unavailable not just to Btrfs, but also to the whole systems. In the ordinary practice of system administration it may well be useful to make a device unavailable to Btrfs but still available to the system, for example for testing, and anyhow they are logically distinct states. That also means a member device might well be available to the system, but marked as "not available" to Btrfs. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html