On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller <and...@kellerfarm.com> wrote:
> Hi all,
>
> I am considering developing a new feature, and I'd like to poll the group for 
> opinions.
>
> Background: A couple years ago, I wrote a set of scripts that speed up 
> cloning of frequently used repositories.  The scripts utilize a bare Git 
> repository located at a known location, and automate providing a --reference 
> parameter to `git clone` and `git submodule update`.  Recently, some 
> coworkers of mine expressed an interest in using the scripts, so I published 
> the current version of my scripts, called `git repocache`, described at the 
> bottom of <https://github.com/andrewkeller/ak-git-tools>.
>
> Slowly, it has occurred to me that this feature, or something similar to it, 
> may be worth adding to Git, so I've been thinking about the best approach.  
> Here's my best idea so far:
>
> 1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
> '--reference', except that it operates on a temporary basis, and does not 
> assume that the reference repository will exist after the operation 
> completes, so any used objects are copied into the local objects database.  
> In theory, this mechanism would be distinct from '--reference', so if both 
> are used, some objects would be copied, and some objects would be accessible 
> via a reference repository referenced by the alternates file.

Isn't this the same as git clone --reference <path> --no-hardlinks <url> ?

Also without --no-hardlinks we're not assuming that the other repo
doesn't go away (you could rm-rf it), just that the files won't be
*modified*, which Git won't do, but you could manually do with other
tools, so the default is to hardlink.

> 2)  Teach `git fetch` to read 'repocache.path' (or a better-named 
> configuration), and use it to automatically activate borrowing.

So a default path for --reference <path> --no-hardlinks ?

> 3)  For consistency, `git clone`, `git pull`, and `git submodule update` 
> should probably all learn '--borrow', and forward it to `git fetch`.
>
> 4)  In some scenarios, it may be necessary to temporarily not automatically 
> borrow, so `git fetch`, and everything that calls it may need an argument to 
> do that.
>
> Intended outcome: With 'repocache.path' set, and the cached repository 
> properly updated, one could run `git clone <url>`, and the operation would 
> complete much faster than it does now due to less load on the network.
>
> Things I haven't figured out yet:
>
> *  What's the best approach to copying the needed objects?  It's probably 
> inefficient to copy individual objects out of pack files one at a time, but 
> it could be wasteful to copy entire pack files just because you need one 
> object.  Hard-linking could help, but that won't always be available.  One of 
> my previous ideas was to add a '--auto-repack' option to `git-clone`, which 
> solves this problem better, but introduces some other front-end usability 
> problems.
> *  To maintain optimal effectiveness, users would have to regularly run a 
> fetch in the cache repository.  Not all users know how to set up a scheduled 
> task on their computer, so this might become a maintenance problem for the 
> user.  This kind of problem I think brings into question the viability of the 
> underlying design here, assuming that the ultimate goal is to clone faster, 
> with very little or no change in the use of git.
>
>
> Thoughts?
>
> Thanks,
> Andrew Keller
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to