>>>>> "PB" == Petr Baudis <[EMAIL PROTECTED]> writes:
PB> It won't happen. Or rather, I hope the HTTP pulls become more efficient
PB> soon. Actually, perhaps Linus has something done already, my workstation
PB> is a bit derailed now so I couldn't pull from him in the last few days
PB> (hopefully will sort that out today).
PB> Hmm, yes, I guess Linus won't be touching the HTTP backend at all. ;-) I
PB> suggest you to check the last development in Linus' branch and sync with
PB> Daniel Barkalow, who promised improving the pull tools as well.
If this weekend is not too late, I have been brewing what is
called an "efficient pull from dumb servers" suite, which would
hopefully fill this gap. I am still in the process of finishing
the details, but basically it already seems to work.
Linus, please drop the patch I sent you earlier, privately by
mistake not CCing the list, that implemented only the server
end. I've changed some file formats already from that one.
The outline of how it works is like this.
* I assume a dumb transport (read: static files only HTTP
server) and no on-request server side processing. All the
smarts must go in the client. The server side X.git being an
ordinary GIT archive (no need for files in the work tree),
- X.git/objects/pack can have packed GIT archives. I
envision that this will be a series of 5 to 20 MB packs,
occasionally adding a new incremental pack when
X.git/objects/??/ directories accumulate enough standalone
SHA1 files. It is not necessary to have X.git/objects/??/
files if an object is contained in one of the packs.
- X.git/info/ has three extra files.
- "inventory" lists all the branches stored in X.git/refs
and looks like this (contents and path):
This is to facilitate discovery from a transport that is
not so "ls" friendly, like HTTP.
- "pack" lists available packs under X.git/objects/pack and
looks like this (size and name):
The file is there for discovery. The size is used by the
client to discover optimum set of packs to slurp.
- "rev-cache" is a binary file that describes commit
ancestry information in a dense format. It lists all
commits available from this repository along with who
its parents are for each of the commit. This file is
produced append-only, so that the server side can use
rsync based mirroring scheme.
A new command "git-update-dumb-server" is used to prepare
these three files. There may need a helper script that uses
git-pack-objects and friends to prepare packs partitioned to
allow pulling a popular branch efficiently.
* The client side is called "git-dumb-pull-script". This
downloads the above three files, and .idx files associated
with packs described in "pack". With the information in
"inventory" about desired branch to pull from along with
"rev-cache" ancestry information, it discovers the set of
commits that is lacking from its local store. By comparing
that list with downloaded .idx files, along with size
information for each pack, it comes up a list of packs to
download to cover the most commits that it wants to obtain,
and downloads them, verifies them and stores them in its
The above process of downloading packs would typically not
cover all the things lacking, because some new commits may
not be in any of the packs. After this point, the usual
commit-walking git-http-pull can be used to fill the rest,
and it does not have to pull that many objects. Dan's
http-pull parallelism improvement would be very useful
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html