On Tue, Jun 16, 2020, at 21:02, Marc Espie wrote: > > The concept you need to understand is snapshot shearing. > > A full package snapshot is large enough that it's hard to guarantee that > you will have a full snapshot on a mirror at any point in time. > > In fact, you will sometimes encounter a mix of two snapshots (not that often, > recently, but still) > > Hence, the decision to not have a central index for all packages, but to > keep (and trust) the actual meta-info within the packages proper. > > The only way to avoid that would be to have specific tools for mirroring, > and to ask mirror sites to set aside twice as much room so that they > can rotate snapshots. > > Now, each package has all dependency information in the packing-list... > which allows pkg_add to know whether to update or not. We do a somewhat > strict check, we have tried to relax the check at some point in the past, > but this was causing more trouble than it was worth. > > The amount of data transmitted during pkg_add -u mostly depends on signify: > you have to grab at least a full signed block, which is slightly over 64KB. > > On most modern systems, the bandwidth is not an issue, the number of RTT > is more problematic. The way to speed pkg_add -u up would be to keep the > connection alive, which means ditching ftp(1) and switching to specific code > that talks modern http together with byte-ranges. >
Thank you for this information. It was very helpful! So currently with my 433 packages installed, I need to transfer ~28MB of data and make the associated RTTs every time I want to know if I have updates. Multiply this by the number of people using OpenBSD that also check for package updates periodically (a completely unknown number to me), and it seems like there could be benefits to both the mirror providers and the users to optimize this a bit :) I'm spit-balling ideas to optimize this in my head, and I came up with something that seems simple and feasible to me. Please let me know if I'm way off-base here with my proposal, as I may be making false assumptions here. I noticed that there doesn't seem to be one "master" sha that hashes *all* of the files of a given package, but rather one sha per file that the package installs. I'm assuming that this is what causes us to have to download the +CONTENTS file from every installed package. Here is my proposal: maintain a single separate file in the mirrors (similar to the index.txt) where each line looks like this: got-0.36.tgz N9IEajlcv8snEv9clqcnsZZodggAR8VnFwCdu8I19WY= where the package name is business as usual, but the hash is actually a reduction of all the existing hashes from the +CONTENTS into a single hash (through concatenation, xor, probably doesn't matter that much; just pick something). Then, 'pkg_add -u' only downloads this single file as the source of truth for the current package state. The file can of course be signed + compressed as well. Does this idea seem to have any basis in reality?

