While things are quiet (I envy everybody having fun at OLS), I've been cooking something to help clients to pull from dumb servers.
I assume that: - The object database is packed, following the recommendations in the "Working with Others" section of the tutorial. - The repository owner _may_ further create throw-away incremental packs. There can be the following in one object database: - one baseline pack. - permanent incremental packs #1 .. #N - one throw-away incremental pack. - unpacked files under objects/??/. Baseline and permanent incremental packs are built by "git repack", just like Linus recommended from the beginning. The throwaway pack is built periodically (say every hour) to collect all objects that are not in the baseline nor permanent incrementals. Building of such a throw-away pack involves: - unpacking and removal of the current throw-away pack. - running "git repack". - running "git prune-packed". - The server could be truly dumb and can even refuse to serve dirindex; parsing autogenerated index.html is a pain anyway. First, a somewhat related change I did was to write a script called "git ls-remote". It is used this way: $ git ls-remote origin 17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 HEAD 17c0bd743c1c8113cd0ed72b7ca1776d13c27e01 refs/heads/master f0b32737ad5a35cc047db47353a75faccfe5939e refs/heads/linus 4d9ae497491fd838dafd7fcbd11c4aa678a726f1 refs/heads/pu d6602ec5194c87b0fc87103ca4d67251c76f233a refs/tags/v0.99 f25a265a342aed6041ab0cc484224d9ca54b6f41 refs/tags/v0.99.1 It slurps the set of refs from a remote repository (the same short-hand we stole from Cogito using .git/branches/ can be used here) and optionally it can be told to store tags under local refs/. This is produced by connecting directly to the git-daemon running on the remote side and talking upload-pack protocol with it. A new helper program "git-peek-remote" is used to do this when we use git:// URL. From an rsync URL, everything under its refs/ is copied to a temporary directory to produce the same information. To support the same on a dumb transport, I gave the server side a new command, "git update-server-info", which prepares this information in "$repo/info/refs", so writing http support for "git ls-remote" using curl is trivial. I arranged things so that update-server-info is run whenever you push into the repository via "git push". You can of course run it by hand from the command line. The other file that update-server-info produces is to help dumb pullers. It is stored in "$repo/objects/info/pack", and looks like this: P pack-c60dc6f7486e34043bd6861d6b2c0d21756dde76.pack P pack-e3117bbaf6a59cb53c3f6f0d9b17b9433f0e4135.pack D 0 1 D 1 T 0 9fb1759a3102c26cd8f64254a7c3e532782c2bb8 commit T 0 a339981ec18d304f9efeb9ccf01b1f04302edf32 tag T 1 0397236d43e48e821cce5bbe6a80a1a56bb7cc3a tag T 1 043d051615aa5da09a7e44f1edbb69798458e067 commit T 1 06f6d9e2f140466eeb41e494e14167f90210f89d tag T 1 26791a8bcf0e6d33f43aef7682bdb555236d56de tag T 1 5dc01c595e6c6ec9ccda4f6f69c131c0dd945f8c tag T 1 701d7ecec3e0c6b4ab9bb824fd2b34be4da63b7e tag T 1 733ad933f62e82ebc92fed988c7f0795e64dea62 tag T 1 9e734775f7c22d2f89943ad6c745571f1930105f tag T 1 c521cb0f10ef2bf28a18e1cc8adf378ccbbe5a19 tag T 1 ebb5573ea8beaf000d4833735f3e53acb9af844c tag The lines that start with a 'P' list all the packs available in this object database (relative to $repo/objects/pack). These packs are implicitly numbered starting at 0 in the order they appear in the file; in the above, the pack c60dc6... is pack #0 and e3117b... is pack #1. The lines that start with a 'D' list the dependencies. "D 0 1" says, pack #0 is not complete and refers to objects found in pack #1 (e.g. a commit object in pack #0 has a subtree that is the same one found in pack #1 hence pack #0 does not contain that tree). "D 1" shows that the pack #1 is self sufficient and does not depend on anything (it is the linux-2.6 baseline pack). Of course, you could have a pack that depends on more than one packs, in which case you would see something like "D 4 1 2 3" to mean pack #4 depending on packs #1, #2 and #3. If the repository follows the "baseline, permanent incrementals, and one throw-away" scheme I outlined above, the baseline would be self sufficient, most likely incremental #i would depend on the baseline and all the incrementals #j (j < i), and the throw-away would depend on everybody else. The lines that start with a 'T' list objects in a pack that are not referenced by anything else in the same pack (they are typically branch heads and tags). We can see that pack #0 has one head commit and a tag in the above example. This file always resides at a known location. A client can do something like this to slurp from a dumb server: (1) Fetch $repo/objects/info/pack file for the above information. (2) Look at T lines. If you have all the objects listed there for a pack, and if your repository is not incomplete to begin with, you are not interested in that pack. By definition, all things that are in that pack are reachable from one of those objects listed on the T lines, and you already have them. Otherwise, you _may_ be interested in that pack. (3) Download corresponding .idx files for the packs you are interested in. Run "git show-index" to see if the heads/tags you are interested in appear in one of them (you found out about the heads/tags using "git ls-remote" earlier). If you find a pack that contains objects you are interested in, look at D lines to make sure you have all the head objects from packs that this pack depends on; otherwise you need to slurp that depended-upon packs as well (needless to say, this goes recursive). (4) Download the packs you decided to pick in the previous step. It is up to you if you unpack those packs, but if the upstream has it statically packed I would recommend against unpacking. Next time around you can just look at the name of the pack and decide you already have that pack. On the other hand, keeping a throw-away packed may not make much sense. You can unpack the throw-away and then run "git prune-packed" in your repository next time you get the pack info file from the repository, by noticing that the pack is gone from the remote repository already. (5) Fill the rest using the commit walker. The initial client implementation which is _really_ dumb could even skip steps (2) and (3) and choose to always download/sync all available packs from the dumb server, and directly go to step (5) to fall back on the commit walker. I haven't written the client side, but all the rest that are necessary to support the above will be sent to the list as separate patches. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html