On Mon, 2007-06-25 at 17:51 -0400, C. Scott Ananian wrote: > As food for discussion, here's a counter proposal for an XO upgrade > mechanism, focusing on the network side. > --------- > Design goals: > - minimize round-trips necessary for successful upgrade > - minimize size of upgrades
(I'm using this reply as a general reply for the whole update thread instead of replying to each mail.) First, I must say that the numbers you posted for psdiff look pretty nice for smaller updates like security problems. (I'm not sure they will help as much for larger updates though.) It sounds interesting to try to use this if possible. The original discussion about diffs and updates talked about storing the original "gold" image on all laptops, and then sending each update as a diff from this image. This would mean that the diffs would keep growing in size, being larger to send and we'd need to use more and more space for the original image data not used anymore. However, your proposal instead seems to be that all diffs are incremental (and availible as reverse diffs for incremental reversal). This keeps each update small, but instead you need to get all the updates and apply them in order. There is a general difference in how data is distributed in these two proposals. In your proposal each machine polls and downloads from a central server. This relies on the mesh (and ultimately a connection to the internet) for downloading the data. This makes it very sensitive to the performance of the mesh network. To make this better it uses multicast to avoid the same data being sent over the mesh to many times. My distribution model instead is more distributed. The idea is that while full mesh network traffic is spotty and slow the local wlan connections have no reason to be much worse than any other wlan connections (including ours). So, to download a full file from a neighbour isn't all that slow. Nor does it affect the general performance of the mesh. Instead of all machines downloading the update from the server via the mesh (or multicast) upgrades happen in stages. The first stage is the school server getting the new version and announching it. Then all machines that are close to the server will find the upgrade and update. Then machines close to these laptops will upgrade from the laptops and then the update will propagate outwards one machine at a time. (Things are also dynamic, as the machines move around.) The spread is "exponential" as the number of machines with the update availible increases, and at no time are we sending data over the mesh. This distributed system makes it possible for laptops to get upgraded that doesn't have a connection that reaches the school server, as long as another updated laptop happens to at some time be in the neighborhood. And it also is possible to update such a laptop to any version, without having to store all the diffs between the two versions in the update (i.e. if the laptop hasn't had connection to the net in a while and don't have a recent update). I must say that the multicasting portion of your proposal sounds risky to me. It seems like it is pretty easy for there to be multiple Upgrade Leaders for the same version, both over time and in parallel, (for instance due to missed packages, laptops turned on at different times, etc). Each of these sending multicast messages on the mesh as they upgrade sound like it could easily fill the mesh with a lot of multicast traffic. When you say "multicast", how far do you mean these packages would be sent? I guess on a mesh network multicast packages are limted by some kind of hop limit, to avoid loops if nothing else? Some more info on how multicast works on the mesh would be nice. Also, i keep seeing references to vserver, but i still hasn't seen any way this could be used to update the system image. In the case of your proposal this doesn't change much though. For instance, if we used jffs2 filesystem transactions instead the only difference is that applying the diffs as they come might not be a good idea, instead we keep them and apply them all in a transaction when we're ready. So, to sum it up, my approach certainly uses more bits on the network for data transfer, but I don't think that is a large issue, since all (most?) transfers are local, and thus should get good performance. It is also less dependent on direct connections to the internet (or the school server) and doesn't cause mesh network traffic affecting bandwidth for other people. Its also, imho vastly simpler to implement, test, and deploy. However, it certainly would be nice if we could use the binary diff approach in a distributed system too. A simple extension for this would be to do diffs on the school server. We could easily store diffs between "consecutive" blobs and manifests (say as http://server/blobs/{first-sha1}-{second-sha1}), and the client could detect that the server has both the version it already has and the target version and request a diff between the two files instead of a full blob (falling back to the full file if the diff wasn't availible). Its a bit harder to distribute this to the laptops though, as we don't want to store older data on them they can't generate diffs. I guess we could store the diffs from the previous version to the current version, hoping that this would not use a lot of storage. I guess that is a question of tradeoff, local bandwidth vs laptop storage. Maybe there is a better way to use diffs. Lemme think on it for a bit. _______________________________________________ Devel mailing list [email protected] http://lists.laptop.org/listinfo/devel
