Re: [OpenZFS Developer] [openzfs] 6418 zpool should have a label clearing command (#32)

ilovezfs Sat, 31 Oct 2015 02:17:46 -0700

It might be a good idea to discuss whether labelclear in OpenZFS should or 
should not do this: https://github.com/zfsonlinux/zfs/issues/3156


(I'm using illumos device names because this is the OpenZFS repo, but all the 
same can be said in terms of ZoL (sda/sda1) and OS X (disk3/disk3s1) device 
names. I don't know enough about FreeBSD device names to comment but I would 
assume it's all the same issues.)

In my opinion, labelclear should clear the exact device it's given, as it does 
now. In other words, it should take the same block device that zdb -l would.

However, it is worth noting that this can be quite confusing to users.

Suppose we create a pool on a physical disk with this command:
zpool create notrpool c2t1d0

zpool status will report notrpool is on c2t1d0.

If we destroy the pool, and want to clear the label, the command is zpool 
labelclear c2t1d0, right? Wrong. Or is it zpool labelclear /dev/rdsk/c2t1d0? 
Also wrong.

Currently, the correct answer is zpool labelclear /dev/rdsk/c2t1d0s0, which is 
indeed the same device that we would need to supply to zdb -l in order to read 
the label. zdb -l c2t1d0 will not work. zdb -l /dev/rdsk/c2t1d0 will not work. 
zdb -l /dev/rdsk/c2t1d0s0 is what is required. So it is logical that 
/dev/rdsk/c2t1d0s0 is the device zpool labelclear expects.

However, we're using the zpool command not the zdb command, and zpool status 
tells us the device is c2t1d0. So many (most?) users will naturally conclude 
that they should zpool labelclear c2t1d0 or at worst zpool labelclear 
/dev/rdsk/c2t1d0. zpool labelclear /dev/rdsk/c2t1d0s0 violates the principle of 
least surprise from that perspective.

The original sin here is the lying that zpool status does in the case where 
whole_disk=1. It strips off the s0 if whole_disk=1 in the name of readability 
and to communicate that ZFS was given the whole device in the original 
create/add/attach command.

The confusion is compounded by the fact that sometimes zpool status DOES print 
the same device that labelclear expects. This is true whenever whole_disk=0 
(modulo the fact that labelclear wants the full path and ZFS only prints the 
basename of whole_disk=0 block devices, though it does print the full path in 
the case of file pools).

So sometimes labelclear expects the exact device status reports (filepools). 
Sometimes labelclear expects the exact device status reports but with the 
dirname prepended (virtual disks and physical disk partitions). And sometimes 
(most common case) labelclear expects the device status reports but with both 
the dirname prepended and the partition name appeneded.

OK, so users can get confused. What else is new? RTFM, etc. All that I've said 
so far could be treated as a documentation issue.

But the situation gets worse.

Suppose the user, seeing c2t1d0 in the zpool status output, DOES do "zpool 
labelclear /dev/rdsk/c2t1d0." The user is likely to conclude that the command 
was successful because supplying the full device mistakenly because the command 
will have the side effect of destroying the GPT. The GPT was there before the 
labelclear command. Partitions were visible, etc. After the command, the gpt is 
gone and the partitions are no longer listed. So the command worked, right? 
After all, it did do something.

So thinking the labelclear command was successful, the user proceeds to try to 
reuse the device for a new pool. Unfortunately, this can result in a 
"mysterious" EBUSY being returned by the kernel. Of course, now there's lots of 
fuss trying to determine what could be busying the device, and troubleshooting 
ensues.

Why EBUSY? Userland looks at the disk that had its GPT (accidentally) wiped, 
does its checks to make sure the device isn't being used by another file system 
or another pool, etc., and all looks good to go. So it proceeds to apply the 
standard auto-partitioning when whole_disk=1 at exactly the same offsets that 
they were previously used by the dead GPT. With the autopartitioning completed, 
userland then hands its freshly minted partition off to the kernel. The kernel, 
being wiser than userland and the final fail safe against dangerous behavior, 
seeing the old pool on the partition it was just handed and balks at creating a 
new one because the device is already in use by another pool and we wouldn't 
want to blow that away.

So has this happened in real life or am I telling a plausible tale?

2011: https://github.com/zfsonlinux/zfs/issues/440#issuecomment-3144878
2015: https://openzfsonosx.org/forum/viewtopic.php?f=26&t=2323&start=10#p6118 
(Two separate users if you read from the beginning of the topic. The link is to 
the final post confirming the solution.)

(If nothing else userland should probably do its busy checks a second time 
AFTER the partitoning to prevent the kernel from having to step in and issue 
mysterious EBUSY errors.)

Should zpool labelclear be made smarter? Should it accept the device names 
zpool status reports? It is probably safest and most flexible to have it only 
operate on the exact device it is given, full path required (same exact path as 
zdb -l wants, after all), just as it does now.

But I can certainly see why the temptation exists to propogate zpool status's 
lying to other commands. Some might suggest automatically labelclearing the 
partition not the full device if whole_disk=1 and the user supplies the full 
device, but that may not always be readable, and it strikes me as dangerous to 
"guess" at what the user really meant when we're zeroing things out. Also, if 
we're going to automatically use the partition instead, how would you actually 
get labelclear to operate on the full device if that is indeed what you 
actually intend? There's also the snag that the partition table may already be 
gone in which case double guessing would be required. Others might suggest 
clearing both the whole device and the partition. And I'm sure there are other 
possible approaches. It's also worth keeping in mind that whole_disk=1 may in 
the future be possible with arbitrary partition numbers and is certainly 
theoretically separable and logically distinct from the autopartiti
 oning, t
 hough they do usually go together nicely.


---
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/32#issuecomment-152717072

_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] [openzfs] 6418 zpool should have a label clearing command (#32)

Reply via email to