Re: degraded permanent mount option

Tomasz Pala Mon, 29 Jan 2018 00:54:25 -0800

On Sun, Jan 28, 2018 at 17:00:46 -0700, Chris Murphy wrote:

> systemd can't possibly need to know more information than a person
> does in the exact same situation in order to do the right thing. No
> human would wait 10 minutes, let alone literally the heat death of the
> planet for "all devices have appeared" but systemd will. And it does


We're already repeating - systemd waits for THE btrfs-compound-device,
not ALL the block-devices. Just like it 'waits' for someone to plug USB
pendrive in.

It is a btrfs choice to not expose compound device as separate one (like
every other device manager does), it is a btrfs drawback that doesn't
provice anything else except for this IOCTL with it's logic, it is a
btrfs drawback that there is nothing to push assembling into "OK, going
degraded" state, it is btrfs drawback that there are no states...

I've told already - pretend the /dev/sda1 device doesn't
exist until assembled. If this overlapping usage was designed with
'easier mounting' on mind, this is simply bad design.

> that by its own choice, its own policy. That's the complaint. It's
> choosing to do something a person wouldn't do, given identical
> available information.

You are expecting systemd to mix in functions of kernel and udev.
There is NO concept of 'assembled stuff' in systemd AT ALL.
There is NO concept of 'waiting' in udev AT ALL.
If you want to do some crazy interlayer shortcuts just implement btrfsd.

> There's nothing the kernel is doing that's
> telling systemd to wait for goddamn ever.

There's nothing the kernel is doing that's
telling udev there IS a degraded device assembled to be used.

There's nothing a userspace-thing is doing that's
telling udev to mark degraded device as mountable.

There is NO DEVICE to be mounted, so systemd doesn't mount it.

The difference is:

YOU think that sda1 device is ephemeral, as it's covered by sda1 btrfs device 
that COULD BE mounted.

I think that there is real sda1 device, following Linux rules of system
registration, which CAN be overtaken by ephemeral btrfs-compound device.
Can I mount that thing above sda1 block device? ONLY when it's properly
registered in the system.

Does btrfs-compound-device register in the system? - Yes, but only fully 
populated.

Just don't expect people will break their code with broken designs just
to overcome your own limitations. If you want systemd to mount degraded
btrfs volume, just MAKE IT REGISTER in the system.

How can btrfs register in the system being degraded? Either by some
userspace daemon handling btrfs volumes states (which are missing from
the kernel), or by some IOCTLs altering in-kernel states.


So for the last time: nobody will break his own code to patch missing
code from other (actively maintained) subsystem.

If you expect degraded mounts, there are 2 choices:

1. implement degraded STATE _some_where_ - udev would handle falling
   back to degraded mount after specified timeout,

2. change this IOCTL to _always_ return 1 - udev would register any
   btrfs device, but you will get random behaviour of mounting
   degraded/populated. But you should expect that since there is no
   concept of any state below.


Actually, this is ridiculous - you expect the degradation to be handled
in some 3rd party software?! In init system? With the only thing you got
is 'degraded' mount option?!
What next - moving MD and LVM logic into systemd?

This is not systemd's job - there are
btrfs-specific kernel cmdline options to be parsed (allowing degraded
volumes), there is tracking of volume health required.
Yes, device-manager needs to track it's components, RAID controller
needs to track minimum required redundancy. It's not only about
mounting. But doing the degraded mounting is easy, only this one
particular ioctl needs to be fixed:

1. counted devices<all  => not_ready

2. counted devices<all BUT
- 'go degraded' received from userspace or kernel cmdline OR
- volume IS mounted and doesn't report errors (i.e. mount -o degraded
  DID succeeded)        => ok_degraded

3. counted devices==all => ok


If btrfs DISTINGUISHES these two states, systemd would be able to use them.


You might ask why this is important for the state to be kept inside some
btrfs-related stuff, like kernel or btrfsd, while the systemd timer
could do the same and 'just mount degraded'. The answear is simple:
systemd.timer is just a sane default CONFIGURATION, that can be EASILY
changed by system administrator. But somewhere, sometime, someone would
have a NEED for totally different set of rules for handling degraded
volumes, just like MD or LVM does. This would be totally irresponsible
to hardcode any mount-degraded rule inside systemd itself.

That is exactly why this must go through the udev - udev is responsible
for handling devices in Linux world. How can I register btrfs device
in udev, since it's overlapping the block device? I can't - the ioctl
is one-way, doesn't accept any userspace feedback.

-- 
Tomasz Pala <go...@pld-linux.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: degraded permanent mount option

Reply via email to