Re: New tool to (quickly) check for available package upgrades

Jeremy O'Brien Wed, 17 Jun 2020 05:24:23 -0700

On Tue, Jun 16, 2020, at 21:02, Marc Espie wrote:
> 
> The concept you need to understand is snapshot shearing.
> 
> A full package snapshot is large enough that it's hard to guarantee that
> you will have a full snapshot on a mirror at any point in time.
> 
> In fact, you will sometimes encounter a mix of two snapshots (not that often,
> recently, but still)
> 
> Hence, the decision to not have a central index for all packages, but to
> keep (and trust) the actual meta-info within the packages proper.
> 
> The only way to avoid that would be to have specific tools for mirroring,
> and to ask mirror sites to set aside twice as much room so that they
> can rotate snapshots.
> 
> Now, each package has all dependency information in the packing-list...
> which allows pkg_add to know whether to update or not.   We do a somewhat
> strict check, we have tried to relax the check at some point in the past,
> but this was causing more trouble than it was worth.
> 
> The amount of data transmitted during pkg_add -u  mostly depends on signify:
> you have to grab at least  a full signed block, which is slightly over 64KB.
> 
> On most modern systems, the bandwidth is not an issue, the number of RTT
> is more problematic.   The way to speed pkg_add -u up would be to keep the
> connection alive, which means ditching ftp(1) and switching to specific code
> that talks modern http together with byte-ranges.
>


Thank you for this information. It was very helpful!

So currently with my 433 packages installed, I need to transfer ~28MB of data 
and make the associated RTTs every time I want to know if I have updates. 
Multiply this by the number of people using OpenBSD that also check for package 
updates periodically (a completely unknown number to me), and it seems like 
there could be benefits to both the mirror providers and the users to optimize 
this a bit :)

I'm spit-balling ideas to optimize this in my head, and I came up with 
something that seems simple and feasible to me. Please let me know if I'm way 
off-base here with my proposal, as I may be making false assumptions here.

I noticed that there doesn't seem to be one "master" sha that hashes *all* of 
the files of a given package, but rather one sha per file that the package 
installs. I'm assuming that this is what causes us to have to download the 
+CONTENTS file from every installed package. Here is my proposal: maintain a 
single separate file in the mirrors (similar to the index.txt) where each line 
looks like this:

got-0.36.tgz    N9IEajlcv8snEv9clqcnsZZodggAR8VnFwCdu8I19WY=

where the package name is business as usual, but the hash is actually a 
reduction of all the existing hashes from the +CONTENTS into a single hash 
(through concatenation, xor, probably doesn't matter that much; just pick 
something). Then, 'pkg_add -u' only downloads this single file as the source of 
truth for the current package state. The file can of course be signed + 
compressed as well.

Does this idea seem to have any basis in reality?

Re: New tool to (quickly) check for available package upgrades

Reply via email to