Re: systemd-sysupdate support for slow rollout (aka A/B testing)
Hello, I have now created an issue in the systemd repository where this can be tracked further as this seems to be something which would fit into sd-sysupdate itself: https://github.com/systemd/systemd/issues/30855 Kind regards, Nils
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
On Di, 02.01.24 14:40, Nils Kattenbeck (nilskem...@gmail.com) wrote: > > > does sysupdate currently support any way to slowly roll out updates > > > where the server providing the files can be in control? [...] > > > > This is currently not available, no. > > > > The idea so far was always that the server is dumb, and the client > > picks the release it wants. > > I feel like it would be more flexible to have the client mostly > handling transferring and applying the data and any additional logic > should be handled by either the server or secondary applications which > call into sysupdate (or its future dbus API). Well, our idea was really that you can use a bog standard static http server to serve this stuff and get as much of the feature set as you could possibly get. > > I have thought about this usecase a while back, and my thinking was > > that such a staged update logic should be driven by the machine > > ID. i.e. we should teach sysupdate a simple logic that allows pattern > > matching of new versions based on some arithmetic of the machine > > ID. More specifically, include some value in the URL pattern that > > indicates the percentage of hosts that shall update to this > > release. Then, each client takes its machine ID, treats it as an > > integer and calculates modulo 100 of it or so, and then checks if the > > resulting value is below the intended percentage, and if so it > > updates, otherwise it doesn't. > > > > (or something like that, the above is probably not ideal, since it > > would mean it's always the same hosts that try a new release first, > > and it probably should be evened out across the set of clients). > > Any logic based on the machine ID would also have the problem I > mentioned below that the ratios would be skewed for stateless devices > which cannot persist their machine id to disk. > > One would at least be able to override it with something persistent > like a MAC address though this could be exposed as some argument or > environment variable which a secondary application could set before > calling sysupdate. Yeah, we have something similar with the seed logic of repart already (by default repart derives partition uuids to create from the machine ID but you can also specify a seed explicitly), I see no problem with adding the same to sysupdate. That should be trivial and just adopt a scheme we already introduced at one more place. I am fully on board with that. I'd also be fine with a kernel cmdline option or so which allows fine-tuning where PID 1 takes the machine ID from if it generates a new one. Right now it takes the SMBIOS ID when running in a VM, and has a similar mechanism for containers. We could probably add something to optionally tell it to pull the ID to use from smbios/dt even on physical systems, or even from the TPM. (I am not sure how far the MAC thing would work. AFAIK on a lot of embedded systems the MAC is expected to be randomly generated by software, hence would not be useful as an identifier here. Also, it's directly publically visible, which makes it too easily guessable. And there's the raciness issue: usually device drivers for such auxiliary hardware are loaded relatively late, and we want the machine ID relatively early.) > > > I also remember there being a discussion about plugging in different > > > sd-pull like implementations/backends[1] to support delta updates, > > > other transports, or TLS client authentication. This could at least be > > > adapted to support my idea to send the machine-id as an HTTP header > > > (e.g. X-MACHINE-ID). > > > > If we can avoid it, I'd always adopt a logic whether identifying info > > doesn't have to be sent to the server. After all the logic should be > > generic and applicable in scenarios where the client should get > > anonymity as much as it wants. > > If the client automatically applies updates the server could always > deliver an image which exposes information by e.g. simply updating the > Path= to include %m somewhere in it. > Though I agree that always sending such information in headers would > not be optimal. > > I also found out that sd-import drops query parameters from the URL. I guess we could change this. if a query parameter is explicitly specified I see no reason to unconditionally drop it. > If this were not the case my use case would already be possible by > embedding the machine ID as part of the query. > This would also make it possible to opt in to sending the information. > > The problem I think is that there are two user groups of sysupdate > with different requirements. > On one hand we have end user distributions with A/B style updates > where the distribution only has limited to no interest over precise > control of updates and user devices and the users wish for anonymity. > On the other hand though are enterprises which deploy sysupdate for > (I)IoT devices. In these case devices commonly have to be registered > anyhow, and the enterprise controls how updates are
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
> > does sysupdate currently support any way to slowly roll out updates > > where the server providing the files can be in control? [...] > > This is currently not available, no. > > The idea so far was always that the server is dumb, and the client > picks the release it wants. I feel like it would be more flexible to have the client mostly handling transferring and applying the data and any additional logic should be handled by either the server or secondary applications which call into sysupdate (or its future dbus API). > I have thought about this usecase a while back, and my thinking was > that such a staged update logic should be driven by the machine > ID. i.e. we should teach sysupdate a simple logic that allows pattern > matching of new versions based on some arithmetic of the machine > ID. More specifically, include some value in the URL pattern that > indicates the percentage of hosts that shall update to this > release. Then, each client takes its machine ID, treats it as an > integer and calculates modulo 100 of it or so, and then checks if the > resulting value is below the intended percentage, and if so it > updates, otherwise it doesn't. > > (or something like that, the above is probably not ideal, since it > would mean it's always the same hosts that try a new release first, > and it probably should be evened out across the set of clients). Any logic based on the machine ID would also have the problem I mentioned below that the ratios would be skewed for stateless devices which cannot persist their machine id to disk. One would at least be able to override it with something persistent like a MAC address though this could be exposed as some argument or environment variable which a secondary application could set before calling sysupdate. > This would then mean for the server that it would first serve > foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of > the hosts would update to it. And then, once you collected enough > feedback you'd rename the file to foobar_47.11_25.raw and 25% of the > hosts would switch over. Finally you'd set the value to 100 (or maybe > just drop it, which should be considered equivalent to 100), and then > all remaining hosts would update. > > The effect of this is that client's could still explicitly upgrade if > they want, and the updates would be entirely driven by the clients, > but simply via naming the download images the server can control that > "by default" only the chosen number of clients update. The explicit update by clients is definitely a nice bonus though this can also be achieved by a secondary set of definitions looking for files under s3.domain.com/rc/. > > Currently it seems like I would have to implement a different service > > which calls the sysupdate binary (or uses dbus once #28134 has landed) > > and then decides based on some other information. > > > > One idea I had would be that systemd-pull could send the machine-id > > based on which the server could then decide to provide the newer file > > (e.g. last two chars == "00" would roll it out to ~1/255). Though I am > > not sure if sd-pull is supposed to be "anonymous", i.e. do not provide > > this identifying information. Another drawback of this would be that > > stateless systems which reboot often get a new machine-id each boot, > > thus having an increased chance to get the newer version. > > So this idea is not entirely different from my idea, I was just > thinking about pushing this into sysupdate rather than pull. > > > Does anything like this already exist or is planned? Or should that be > > done by different applications on the client side? > > I think it makes a ton of sense to add this to sysupdate. Would love > to review/merge a patch for that. > > > I also remember there being a discussion about plugging in different > > sd-pull like implementations/backends[1] to support delta updates, > > other transports, or TLS client authentication. This could at least be > > adapted to support my idea to send the machine-id as an HTTP header > > (e.g. X-MACHINE-ID). > > If we can avoid it, I'd always adopt a logic whether identifying info > doesn't have to be sent to the server. After all the logic should be > generic and applicable in scenarios where the client should get > anonymity as much as it wants. If the client automatically applies updates the server could always deliver an image which exposes information by e.g. simply updating the Path= to include %m somewhere in it. Though I agree that always sending such information in headers would not be optimal. I also found out that sd-import drops query parameters from the URL. If this were not the case my use case would already be possible by embedding the machine ID as part of the query. This would also make it possible to opt in to sending the information. The problem I think is that there are two user groups of sysupdate with different requirements. On one hand we have end user distributions with A/B
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
On Di, 02.01.24 13:11, Simon McVittie (s...@collabora.com) wrote: > Prior art: Debian/Ubuntu apt does slow rollout for packages like > this, with simple filesystem-based http mirrors combined with "smart" > clients. It works by adding a Phased-Update-Percentage field to the > metadata of each package. The client calculates some sort of ID for itself > (I don't know precisely how), and then takes the upgrade if it finds that > its ID is in the first x% of the available range. > > If I understand correctly, Ubuntu is using this mechanism in production > but Debian is not. > > Using some sort of hash of the machine ID + the proposed version would > probably have the behaviour you want, of choosing a different x% of > machines to be the early-adopter set for each update? Yes, this is what I think would be the right approach. > > This would then mean for the server that it would first serve > > foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of > > the hosts would update to it. And then, once you collected enough > > feedback you'd rename the file to foobar_47.11_25.raw and 25% of the > > hosts would switch over. Finally you'd set the value to 100 (or maybe > > just drop it, which should be considered equivalent to 100), and then > > all remaining hosts would update. > > If you're using a hash of the machine ID + the proposed version as > your randomization, then I think you'd want to have a single image (or > version ID, or some other unique identifier) for each proposed update, and > separately, a metadata field that sets *x* in the instruction "if you have > figured out that you are in the first x% of machines, upgrade". Otherwise, > publishing foobar_47.11_3.raw followed by foobar_47.11_25.raw would be > more likely to result in approximately (3% + 25% = 28%) of machines > upgrading[1], because the client doesn't know that it's actually the > same update and would "re-roll the dice" for each republished name. My thinking was that clients would look at multiple entries which only differ by the percentage (i.e. are identical in name and version) and drop all of them but the one with the highest percentage, and ignore all others. Lennart -- Lennart Poettering, Berlin
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
On Tue, 02 Jan 2024 at 11:16:15 +0100, Lennart Poettering wrote: > The idea so far was always that the server is dumb, and the client > picks the release it wants. > > I have thought about this usecase a while back, and my thinking was > that such a staged update logic should be driven by the machine > ID. i.e. we should teach sysupdate a simple logic that allows pattern > matching of new versions based on some arithmetic of the machine > ID. More specifically, include some value in the URL pattern that > indicates the percentage of hosts that shall update to this > release. Then, each client takes its machine ID, treats it as an > integer and calculates modulo 100 of it or so, and then checks if the > resulting value is below the intended percentage, and if so it > updates, otherwise it doesn't. > > (or something like that, the above is probably not ideal, since it > would mean it's always the same hosts that try a new release first, > and it probably should be evened out across the set of clients). Prior art: Debian/Ubuntu apt does slow rollout for packages like this, with simple filesystem-based http mirrors combined with "smart" clients. It works by adding a Phased-Update-Percentage field to the metadata of each package. The client calculates some sort of ID for itself (I don't know precisely how), and then takes the upgrade if it finds that its ID is in the first x% of the available range. If I understand correctly, Ubuntu is using this mechanism in production but Debian is not. Using some sort of hash of the machine ID + the proposed version would probably have the behaviour you want, of choosing a different x% of machines to be the early-adopter set for each update? > This would then mean for the server that it would first serve > foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of > the hosts would update to it. And then, once you collected enough > feedback you'd rename the file to foobar_47.11_25.raw and 25% of the > hosts would switch over. Finally you'd set the value to 100 (or maybe > just drop it, which should be considered equivalent to 100), and then > all remaining hosts would update. If you're using a hash of the machine ID + the proposed version as your randomization, then I think you'd want to have a single image (or version ID, or some other unique identifier) for each proposed update, and separately, a metadata field that sets *x* in the instruction "if you have figured out that you are in the first x% of machines, upgrade". Otherwise, publishing foobar_47.11_3.raw followed by foobar_47.11_25.raw would be more likely to result in approximately (3% + 25% = 28%) of machines upgrading[1], because the client doesn't know that it's actually the same update and would "re-roll the dice" for each republished name. smcv [1] more precisely, (0.03 + 0.25 - (0.03 * 0.25)) because of how conditional probabilities combine
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
On Mi, 20.12.23 19:04, Nils Kattenbeck (nilskem...@gmail.com) wrote: > Hey everyone, > > does sysupdate currently support any way to slowly roll out updates > where the server providing the files can be in control? This would be > used to slowly make a new version available and have it at e.g. 1% > adoption for a day to monitor regressions before increasing the > coverage. I was unable to find any information about it in the > documentation. This is currently not available, no. The idea so far was always that the server is dumb, and the client picks the release it wants. I have thought about this usecase a while back, and my thinking was that such a staged update logic should be driven by the machine ID. i.e. we should teach sysupdate a simple logic that allows pattern matching of new versions based on some arithmetic of the machine ID. More specifically, include some value in the URL pattern that indicates the percentage of hosts that shall update to this release. Then, each client takes its machine ID, treats it as an integer and calculates modulo 100 of it or so, and then checks if the resulting value is below the intended percentage, and if so it updates, otherwise it doesn't. (or something like that, the above is probably not ideal, since it would mean it's always the same hosts that try a new release first, and it probably should be evened out across the set of clients). This would then mean for the server that it would first serve foobar_47.11_3.raw which would be version 47.11 of the OS, and 3% of the hosts would update to it. And then, once you collected enough feedback you'd rename the file to foobar_47.11_25.raw and 25% of the hosts would switch over. Finally you'd set the value to 100 (or maybe just drop it, which should be considered equivalent to 100), and then all remaining hosts would update. The effect of this is that client's could still explicitly upgrade if they want, and the updates would be entirely driven by the clients, but simply via naming the download images the server can control that "by default" only the chosen number of clients update. > Currently it seems like I would have to implement a different service > which calls the sysupdate binary (or uses dbus once #28134 has landed) > and then decides based on some other information. > > One idea I had would be that systemd-pull could send the machine-id > based on which the server could then decide to provide the newer file > (e.g. last two chars == "00" would roll it out to ~1/255). Though I am > not sure if sd-pull is supposed to be "anonymous", i.e. do not provide > this identifying information. Another drawback of this would be that > stateless systems which reboot often get a new machine-id each boot, > thus having an increased chance to get the newer version. So this idea is not entirely different from my idea, I was just thinking about pushing this into sysupdate rather than pull. > Does anything like this already exist or is planned? Or should that be > done by different applications on the client side? I think it makes a ton of sense to add this to sysupdate. Would love to review/merge a patch for that. > I also remember there being a discussion about plugging in different > sd-pull like implementations/backends[1] to support delta updates, > other transports, or TLS client authentication. This could at least be > adapted to support my idea to send the machine-id as an HTTP header > (e.g. X-MACHINE-ID). If we can avoid it, I'd always adopt a logic whether identifying info doesn't have to be sent to the server. After all the logic should be generic and applicable in scenarios where the client should get anonymity as much as it wants. The machine-id we usually consider a "half-secret", i.e. all local programs get access to it (unless sandboxed), but they are not supposed to be send it across the wire. If they really need to send some identifier across the wire they should derive an app-specific ID instead, which we make easy to acquire via sd_id128_get_machine_app_specific(). But better than app-specific machine IDs are no machine IDs at all in the protocol, if we can get away with it. Hence, my idea of doing the rollout percentage logic client-side. Lennart -- Lennart Poettering, Berlin
Re: systemd-sysupdate support for slow rollout (aka A/B testing)
Hello and happy New Year, I tried to solve this by adding percent-specifiers as query parameters to the Path= property of the sysupdate definition though to my dismay I had to find out that they are discarded by the sd-import logic. Removing this restriction could solve this problem as one could easily send machine id, os version and similar information to the server. This would in general enable fine grained control over which updates a devices sees. Also see https://lists.freedesktop.org/archives/systemd-devel/2024-January/049889.html for a case where this is desirable. Kind regards, Nils On Wed, Dec 20, 2023, 19:04 Nils Kattenbeck wrote: > Hey everyone, > > does sysupdate currently support any way to slowly roll out updates > where the server providing the files can be in control? This would be > used to slowly make a new version available and have it at e.g. 1% > adoption for a day to monitor regressions before increasing the > coverage. I was unable to find any information about it in the > documentation. > > Currently it seems like I would have to implement a different service > which calls the sysupdate binary (or uses dbus once #28134 has landed) > and then decides based on some other information. > > One idea I had would be that systemd-pull could send the machine-id > based on which the server could then decide to provide the newer file > (e.g. last two chars == "00" would roll it out to ~1/255). Though I am > not sure if sd-pull is supposed to be "anonymous", i.e. do not provide > this identifying information. Another drawback of this would be that > stateless systems which reboot often get a new machine-id each boot, > thus having an increased chance to get the newer version. > > Does anything like this already exist or is planned? Or should that be > done by different applications on the client side? > I also remember there being a discussion about plugging in different > sd-pull like implementations/backends[1] to support delta updates, > other transports, or TLS client authentication. This could at least be > adapted to support my idea to send the machine-id as an HTTP header > (e.g. X-MACHINE-ID). > > Greetings, Nils > > [1] > https://lists.freedesktop.org/archives/systemd-devel/2023-February/048856.html >
systemd-sysupdate support for slow rollout (aka A/B testing)
Hey everyone, does sysupdate currently support any way to slowly roll out updates where the server providing the files can be in control? This would be used to slowly make a new version available and have it at e.g. 1% adoption for a day to monitor regressions before increasing the coverage. I was unable to find any information about it in the documentation. Currently it seems like I would have to implement a different service which calls the sysupdate binary (or uses dbus once #28134 has landed) and then decides based on some other information. One idea I had would be that systemd-pull could send the machine-id based on which the server could then decide to provide the newer file (e.g. last two chars == "00" would roll it out to ~1/255). Though I am not sure if sd-pull is supposed to be "anonymous", i.e. do not provide this identifying information. Another drawback of this would be that stateless systems which reboot often get a new machine-id each boot, thus having an increased chance to get the newer version. Does anything like this already exist or is planned? Or should that be done by different applications on the client side? I also remember there being a discussion about plugging in different sd-pull like implementations/backends[1] to support delta updates, other transports, or TLS client authentication. This could at least be adapted to support my idea to send the machine-id as an HTTP header (e.g. X-MACHINE-ID). Greetings, Nils [1] https://lists.freedesktop.org/archives/systemd-devel/2023-February/048856.html