On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller <and...@kellerfarm.com> wrote:
> Hi all,
> I am considering developing a new feature, and I'd like to poll the group for
> Background: A couple years ago, I wrote a set of scripts that speed up
> cloning of frequently used repositories. The scripts utilize a bare Git
> repository located at a known location, and automate providing a --reference
> parameter to `git clone` and `git submodule update`. Recently, some
> coworkers of mine expressed an interest in using the scripts, so I published
> the current version of my scripts, called `git repocache`, described at the
> bottom of <https://github.com/andrewkeller/ak-git-tools>.
> Slowly, it has occurred to me that this feature, or something similar to it,
> may be worth adding to Git, so I've been thinking about the best approach.
> Here's my best idea so far:
> 1) Introduce '--borrow' to `git-fetch`. This would behave similarly to
> '--reference', except that it operates on a temporary basis, and does not
> assume that the reference repository will exist after the operation
> completes, so any used objects are copied into the local objects database.
> In theory, this mechanism would be distinct from '--reference', so if both
> are used, some objects would be copied, and some objects would be accessible
> via a reference repository referenced by the alternates file.
Isn't this the same as git clone --reference <path> --no-hardlinks <url> ?
Also without --no-hardlinks we're not assuming that the other repo
doesn't go away (you could rm-rf it), just that the files won't be
*modified*, which Git won't do, but you could manually do with other
tools, so the default is to hardlink.
> 2) Teach `git fetch` to read 'repocache.path' (or a better-named
> configuration), and use it to automatically activate borrowing.
So a default path for --reference <path> --no-hardlinks ?
> 3) For consistency, `git clone`, `git pull`, and `git submodule update`
> should probably all learn '--borrow', and forward it to `git fetch`.
> 4) In some scenarios, it may be necessary to temporarily not automatically
> borrow, so `git fetch`, and everything that calls it may need an argument to
> do that.
> Intended outcome: With 'repocache.path' set, and the cached repository
> properly updated, one could run `git clone <url>`, and the operation would
> complete much faster than it does now due to less load on the network.
> Things I haven't figured out yet:
> * What's the best approach to copying the needed objects? It's probably
> inefficient to copy individual objects out of pack files one at a time, but
> it could be wasteful to copy entire pack files just because you need one
> object. Hard-linking could help, but that won't always be available. One of
> my previous ideas was to add a '--auto-repack' option to `git-clone`, which
> solves this problem better, but introduces some other front-end usability
> * To maintain optimal effectiveness, users would have to regularly run a
> fetch in the cache repository. Not all users know how to set up a scheduled
> task on their computer, so this might become a maintenance problem for the
> user. This kind of problem I think brings into question the viability of the
> underlying design here, assuming that the ultimate goal is to clone faster,
> with very little or no change in the use of git.
> Andrew Keller
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html