Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-09 Thread Nils Kattenbeck
Hello,

I have now created an issue in the systemd repository where this can
be tracked further as this seems to be something which would fit into
sd-sysupdate itself: https://github.com/systemd/systemd/issues/30855

Kind regards, Nils


Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-02 Thread Lennart Poettering
On Di, 02.01.24 14:40, Nils Kattenbeck (nilskem...@gmail.com) wrote:

> > > does sysupdate currently support any way to slowly roll out updates
> > > where the server providing the files can be in control? [...]
> >
> > This is currently not available, no.
> >
> > The idea so far was always that the server is dumb, and the client
> > picks the release it wants.
>
> I feel like it would be more flexible to have the client mostly
> handling transferring and applying the data and any additional logic
> should be handled by either the server or secondary applications which
> call into sysupdate (or its future dbus API).

Well, our idea was really that you can use a bog standard static http
server to serve this stuff and get as much of the feature set as you
could possibly get.

> > I have thought about this usecase a while back, and my thinking was
> > that such a staged update logic should be driven by the machine
> > ID. i.e. we should teach sysupdate a simple logic that allows pattern
> > matching of new versions based on some arithmetic of the machine
> > ID. More specifically, include some value in the URL pattern that
> > indicates the percentage of hosts that shall update to this
> > release. Then, each client takes its machine ID, treats it as an
> > integer and calculates modulo 100 of it or so, and then checks if the
> > resulting value is below the intended percentage, and if so it
> > updates, otherwise it doesn't.
> >
> > (or something like that, the above is probably not ideal, since it
> > would mean it's always the same hosts that try a new release first,
> > and it probably should be evened out across the set of clients).
>
> Any logic based on the machine ID would also have the problem I
> mentioned below that the ratios would be skewed for stateless devices
> which cannot persist their machine id to disk.
>
> One would at least be able to override it with something persistent
> like a MAC address though this could be exposed as some argument or
> environment variable which a secondary application could set before
> calling sysupdate.

Yeah, we have something similar with the seed logic of repart already
(by default repart derives partition uuids to create from the machine
ID but you can also specify a seed explicitly), I see no problem with
adding the same to sysupdate. That should be trivial and just adopt a
scheme we already introduced at one more place. I am fully on board
with that.

I'd also be fine with a kernel cmdline option or so which allows
fine-tuning where PID 1 takes the machine ID from if it generates a
new one. Right now it takes the SMBIOS ID when running in a VM, and
has a similar mechanism for containers. We could probably add
something to optionally tell it to pull the ID to use from smbios/dt
even on physical systems, or even from the TPM.

(I am not sure how far the MAC thing would work. AFAIK on a lot of
embedded systems the MAC is expected to be randomly generated by
software, hence would not be useful as an identifier here. Also, it's
directly publically visible, which makes it too easily guessable. And
there's the raciness issue: usually device drivers for such auxiliary
hardware are loaded relatively late, and we want the machine ID relatively
early.)

> > > I also remember there being a discussion about plugging in different
> > > sd-pull like implementations/backends[1] to support delta updates,
> > > other transports, or TLS client authentication. This could at least be
> > > adapted to support my idea to send the machine-id as an HTTP header
> > > (e.g. X-MACHINE-ID).
> >
> > If we can avoid it, I'd always adopt a logic whether identifying info
> > doesn't have to be sent to the server. After all the logic should be
> > generic and applicable in scenarios where the client should get
> > anonymity as much as it wants.
>
> If the client automatically applies updates the server could always
> deliver an image which exposes information by e.g. simply updating the
> Path= to include %m somewhere in it.
> Though I agree that always sending such information in headers would
> not be optimal.
>
> I also found out that sd-import drops query parameters from the URL.

I guess we could change this. if a query parameter is explicitly
specified I see no reason to unconditionally drop it.

> If this were not the case my use case would already be possible by
> embedding the machine ID as part of the query.
> This would also make it possible to opt in to sending the information.
>
> The problem I think is that there are two user groups of sysupdate
> with different requirements.
> On one hand we have end user distributions with A/B style updates
> where the distribution only has limited to no interest over precise
> control of updates and user devices and the users wish for anonymity.
> On the other hand though are enterprises which deploy sysupdate for
> (I)IoT devices. In these case devices commonly have to be registered
> anyhow, and the enterprise controls how updates are 

Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-02 Thread Nils Kattenbeck
> > does sysupdate currently support any way to slowly roll out updates
> > where the server providing the files can be in control? [...]
>
> This is currently not available, no.
>
> The idea so far was always that the server is dumb, and the client
> picks the release it wants.

I feel like it would be more flexible to have the client mostly
handling transferring and applying the data and any additional logic
should be handled by either the server or secondary applications which
call into sysupdate (or its future dbus API).

> I have thought about this usecase a while back, and my thinking was
> that such a staged update logic should be driven by the machine
> ID. i.e. we should teach sysupdate a simple logic that allows pattern
> matching of new versions based on some arithmetic of the machine
> ID. More specifically, include some value in the URL pattern that
> indicates the percentage of hosts that shall update to this
> release. Then, each client takes its machine ID, treats it as an
> integer and calculates modulo 100 of it or so, and then checks if the
> resulting value is below the intended percentage, and if so it
> updates, otherwise it doesn't.
>
> (or something like that, the above is probably not ideal, since it
> would mean it's always the same hosts that try a new release first,
> and it probably should be evened out across the set of clients).

Any logic based on the machine ID would also have the problem I
mentioned below that the ratios would be skewed for stateless devices
which cannot persist their machine id to disk.
One would at least be able to override it with something persistent
like a MAC address though this could be exposed as some argument or
environment variable which a secondary application could set before
calling sysupdate.

> This would then mean for the server that it would first serve
> foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of
> the hosts would update to it. And then, once you collected enough
> feedback you'd rename the file to foobar_47.11_25.raw and 25% of the
> hosts would switch over. Finally you'd set the value to 100 (or maybe
> just drop it, which should be considered equivalent to 100), and then
> all remaining hosts would update.
>
> The effect of this is that client's could still explicitly upgrade if
> they want, and the updates would be entirely driven by the clients,
> but simply via naming the download images the server can control that
> "by default" only the chosen number of clients update.

The explicit update by clients is definitely a nice bonus though this
can also be achieved by a secondary set of definitions looking for
files under s3.domain.com/rc/.

> > Currently it seems like I would have to implement a different service
> > which calls the sysupdate binary (or uses dbus once #28134 has landed)
> > and then decides based on some other information.
> >
> > One idea I had would be that systemd-pull could send the machine-id
> > based on which the server could then decide to provide the newer file
> > (e.g. last two chars == "00" would roll it out to ~1/255). Though I am
> > not sure if sd-pull is supposed to be "anonymous", i.e. do not provide
> > this identifying information. Another drawback of this would be that
> > stateless systems which reboot often get a new machine-id each boot,
> > thus having an increased chance to get the newer version.
>
> So this idea is not entirely different from my idea, I was just
> thinking about pushing this into sysupdate rather than pull.
>
> > Does anything like this already exist or is planned? Or should that be
> > done by different applications on the client side?
>
> I think it makes a ton of sense to add this to sysupdate. Would love
> to review/merge a patch for that.
>
> > I also remember there being a discussion about plugging in different
> > sd-pull like implementations/backends[1] to support delta updates,
> > other transports, or TLS client authentication. This could at least be
> > adapted to support my idea to send the machine-id as an HTTP header
> > (e.g. X-MACHINE-ID).
>
> If we can avoid it, I'd always adopt a logic whether identifying info
> doesn't have to be sent to the server. After all the logic should be
> generic and applicable in scenarios where the client should get
> anonymity as much as it wants.

If the client automatically applies updates the server could always
deliver an image which exposes information by e.g. simply updating the
Path= to include %m somewhere in it.
Though I agree that always sending such information in headers would
not be optimal.

I also found out that sd-import drops query parameters from the URL.
If this were not the case my use case would already be possible by
embedding the machine ID as part of the query.
This would also make it possible to opt in to sending the information.

The problem I think is that there are two user groups of sysupdate
with different requirements.
On one hand we have end user distributions with A/B 

Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-02 Thread Lennart Poettering
On Di, 02.01.24 13:11, Simon McVittie (s...@collabora.com) wrote:

> Prior art: Debian/Ubuntu apt does slow rollout for packages like
> this, with simple filesystem-based http mirrors combined with "smart"
> clients. It works by adding a Phased-Update-Percentage field to the
> metadata of each package. The client calculates some sort of ID for itself
> (I don't know precisely how), and then takes the upgrade if it finds that
> its ID is in the first x% of the available range.
>
> If I understand correctly, Ubuntu is using this mechanism in production
> but Debian is not.
>
> Using some sort of hash of the machine ID + the proposed version would
> probably have the behaviour you want, of choosing a different x% of
> machines to be the early-adopter set for each update?

Yes, this is what I think would be the right approach.

> > This would then mean for the server that it would first serve
> > foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of
> > the hosts would update to it. And then, once you collected enough
> > feedback you'd rename the file to foobar_47.11_25.raw and 25% of the
> > hosts would switch over. Finally you'd set the value to 100 (or maybe
> > just drop it, which should be considered equivalent to 100), and then
> > all remaining hosts would update.
>
> If you're using a hash of the machine ID + the proposed version as
> your randomization, then I think you'd want to have a single image (or
> version ID, or some other unique identifier) for each proposed update, and
> separately, a metadata field that sets *x* in the instruction "if you have
> figured out that you are in the first x% of machines, upgrade". Otherwise,
> publishing foobar_47.11_3.raw followed by foobar_47.11_25.raw would be
> more likely to result in approximately (3% + 25% = 28%) of machines
> upgrading[1], because the client doesn't know that it's actually the
> same update and would "re-roll the dice" for each republished name.

My thinking was that clients would look at multiple entries which only
differ by the percentage (i.e. are identical in name and version) and
drop all of them but the one with the highest percentage, and ignore
all others.

Lennart

--
Lennart Poettering, Berlin


Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-02 Thread Simon McVittie
On Tue, 02 Jan 2024 at 11:16:15 +0100, Lennart Poettering wrote:
> The idea so far was always that the server is dumb, and the client
> picks the release it wants.
> 
> I have thought about this usecase a while back, and my thinking was
> that such a staged update logic should be driven by the machine
> ID. i.e. we should teach sysupdate a simple logic that allows pattern
> matching of new versions based on some arithmetic of the machine
> ID. More specifically, include some value in the URL pattern that
> indicates the percentage of hosts that shall update to this
> release. Then, each client takes its machine ID, treats it as an
> integer and calculates modulo 100 of it or so, and then checks if the
> resulting value is below the intended percentage, and if so it
> updates, otherwise it doesn't.
> 
> (or something like that, the above is probably not ideal, since it
> would mean it's always the same hosts that try a new release first,
> and it probably should be evened out across the set of clients).

Prior art: Debian/Ubuntu apt does slow rollout for packages like
this, with simple filesystem-based http mirrors combined with "smart"
clients. It works by adding a Phased-Update-Percentage field to the
metadata of each package. The client calculates some sort of ID for itself
(I don't know precisely how), and then takes the upgrade if it finds that
its ID is in the first x% of the available range.

If I understand correctly, Ubuntu is using this mechanism in production
but Debian is not.

Using some sort of hash of the machine ID + the proposed version would
probably have the behaviour you want, of choosing a different x% of
machines to be the early-adopter set for each update?

> This would then mean for the server that it would first serve
> foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of
> the hosts would update to it. And then, once you collected enough
> feedback you'd rename the file to foobar_47.11_25.raw and 25% of the
> hosts would switch over. Finally you'd set the value to 100 (or maybe
> just drop it, which should be considered equivalent to 100), and then
> all remaining hosts would update.

If you're using a hash of the machine ID + the proposed version as
your randomization, then I think you'd want to have a single image (or
version ID, or some other unique identifier) for each proposed update, and
separately, a metadata field that sets *x* in the instruction "if you have
figured out that you are in the first x% of machines, upgrade". Otherwise,
publishing foobar_47.11_3.raw followed by foobar_47.11_25.raw would be
more likely to result in approximately (3% + 25% = 28%) of machines
upgrading[1], because the client doesn't know that it's actually the
same update and would "re-roll the dice" for each republished name.

smcv

[1] more precisely, (0.03 + 0.25 - (0.03 * 0.25)) because of how
conditional probabilities combine


Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-02 Thread Lennart Poettering
On Mi, 20.12.23 19:04, Nils Kattenbeck (nilskem...@gmail.com) wrote:

> Hey everyone,
>
> does sysupdate currently support any way to slowly roll out updates
> where the server providing the files can be in control? This would be
> used to slowly make a new version available and have it at e.g. 1%
> adoption for a day to monitor regressions before increasing the
> coverage. I was unable to find any information about it in the
> documentation.

This is currently not available, no.

The idea so far was always that the server is dumb, and the client
picks the release it wants.

I have thought about this usecase a while back, and my thinking was
that such a staged update logic should be driven by the machine
ID. i.e. we should teach sysupdate a simple logic that allows pattern
matching of new versions based on some arithmetic of the machine
ID. More specifically, include some value in the URL pattern that
indicates the percentage of hosts that shall update to this
release. Then, each client takes its machine ID, treats it as an
integer and calculates modulo 100 of it or so, and then checks if the
resulting value is below the intended percentage, and if so it
updates, otherwise it doesn't.

(or something like that, the above is probably not ideal, since it
would mean it's always the same hosts that try a new release first,
and it probably should be evened out across the set of clients).

This would then mean for the server that it would first serve
foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of
the hosts would update to it. And then, once you collected enough
feedback you'd rename the file to foobar_47.11_25.raw and 25% of the
hosts would switch over. Finally you'd set the value to 100 (or maybe
just drop it, which should be considered equivalent to 100), and then
all remaining hosts would update.

The effect of this is that client's could still explicitly upgrade if
they want, and the updates would be entirely driven by the clients,
but simply via naming the download images the server can control that
"by default" only the chosen number of clients update.

> Currently it seems like I would have to implement a different service
> which calls the sysupdate binary (or uses dbus once #28134 has landed)
> and then decides based on some other information.
>
> One idea I had would be that systemd-pull could send the machine-id
> based on which the server could then decide to provide the newer file
> (e.g. last two chars == "00" would roll it out to ~1/255). Though I am
> not sure if sd-pull is supposed to be "anonymous", i.e. do not provide
> this identifying information. Another drawback of this would be that
> stateless systems which reboot often get a new machine-id each boot,
> thus having an increased chance to get the newer version.

So this idea is not entirely different from my idea, I was just
thinking about pushing this into sysupdate rather than pull.

> Does anything like this already exist or is planned? Or should that be
> done by different applications on the client side?

I think it makes a ton of sense to add this to sysupdate. Would love
to review/merge a patch for that.

> I also remember there being a discussion about plugging in different
> sd-pull like implementations/backends[1] to support delta updates,
> other transports, or TLS client authentication. This could at least be
> adapted to support my idea to send the machine-id as an HTTP header
> (e.g. X-MACHINE-ID).

If we can avoid it, I'd always adopt a logic whether identifying info
doesn't have to be sent to the server. After all the logic should be
generic and applicable in scenarios where the client should get
anonymity as much as it wants.

The machine-id we usually consider a "half-secret", i.e. all local
programs get access to it (unless sandboxed), but they are not
supposed to be send it across the wire. If they really need to send
some identifier across the wire they should derive an app-specific ID
instead, which we make easy to acquire via
sd_id128_get_machine_app_specific().

But better than app-specific machine IDs are no machine IDs at all in
the protocol, if we can get away with it. Hence, my idea of doing the
rollout percentage logic client-side.

Lennart

--
Lennart Poettering, Berlin


Re: systemd-sysupdate support for slow rollout (aka A/B testing)

2024-01-01 Thread Nils Kattenbeck
Hello and happy New Year,

I tried to solve this by adding percent-specifiers as query parameters to
the Path= property of the sysupdate definition though to my dismay I had to
find out that they are discarded by the sd-import logic. Removing this
restriction could solve this problem as one could easily send machine id,
os version and similar information to the server.

This would in general enable fine grained control over which updates a
devices sees. Also see
https://lists.freedesktop.org/archives/systemd-devel/2024-January/049889.html
for a case where this is desirable.

Kind regards, Nils

On Wed, Dec 20, 2023, 19:04 Nils Kattenbeck  wrote:

> Hey everyone,
>
> does sysupdate currently support any way to slowly roll out updates
> where the server providing the files can be in control? This would be
> used to slowly make a new version available and have it at e.g. 1%
> adoption for a day to monitor regressions before increasing the
> coverage. I was unable to find any information about it in the
> documentation.
>
> Currently it seems like I would have to implement a different service
> which calls the sysupdate binary (or uses dbus once #28134 has landed)
> and then decides based on some other information.
>
> One idea I had would be that systemd-pull could send the machine-id
> based on which the server could then decide to provide the newer file
> (e.g. last two chars == "00" would roll it out to ~1/255). Though I am
> not sure if sd-pull is supposed to be "anonymous", i.e. do not provide
> this identifying information. Another drawback of this would be that
> stateless systems which reboot often get a new machine-id each boot,
> thus having an increased chance to get the newer version.
>
> Does anything like this already exist or is planned? Or should that be
> done by different applications on the client side?
> I also remember there being a discussion about plugging in different
> sd-pull like implementations/backends[1] to support delta updates,
> other transports, or TLS client authentication. This could at least be
> adapted to support my idea to send the machine-id as an HTTP header
> (e.g. X-MACHINE-ID).
>
> Greetings, Nils
>
> [1]
> https://lists.freedesktop.org/archives/systemd-devel/2023-February/048856.html
>


systemd-sysupdate support for slow rollout (aka A/B testing)

2023-12-20 Thread Nils Kattenbeck
Hey everyone,

does sysupdate currently support any way to slowly roll out updates
where the server providing the files can be in control? This would be
used to slowly make a new version available and have it at e.g. 1%
adoption for a day to monitor regressions before increasing the
coverage. I was unable to find any information about it in the
documentation.

Currently it seems like I would have to implement a different service
which calls the sysupdate binary (or uses dbus once #28134 has landed)
and then decides based on some other information.

One idea I had would be that systemd-pull could send the machine-id
based on which the server could then decide to provide the newer file
(e.g. last two chars == "00" would roll it out to ~1/255). Though I am
not sure if sd-pull is supposed to be "anonymous", i.e. do not provide
this identifying information. Another drawback of this would be that
stateless systems which reboot often get a new machine-id each boot,
thus having an increased chance to get the newer version.

Does anything like this already exist or is planned? Or should that be
done by different applications on the client side?
I also remember there being a discussion about plugging in different
sd-pull like implementations/backends[1] to support delta updates,
other transports, or TLS client authentication. This could at least be
adapted to support my idea to send the machine-id as an HTTP header
(e.g. X-MACHINE-ID).

Greetings, Nils

[1] 
https://lists.freedesktop.org/archives/systemd-devel/2023-February/048856.html