On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller <and...@kellerfarm.com> wrote: > Hi all, > > I am considering developing a new feature, and I'd like to poll the group for > opinions. > > Background: A couple years ago, I wrote a set of scripts that speed up > cloning of frequently used repositories. The scripts utilize a bare Git > repository located at a known location, and automate providing a --reference > parameter to `git clone` and `git submodule update`. Recently, some > coworkers of mine expressed an interest in using the scripts, so I published > the current version of my scripts, called `git repocache`, described at the > bottom of <https://github.com/andrewkeller/ak-git-tools>. > > Slowly, it has occurred to me that this feature, or something similar to it, > may be worth adding to Git, so I've been thinking about the best approach. > Here's my best idea so far: > > 1) Introduce '--borrow' to `git-fetch`. This would behave similarly to > '--reference', except that it operates on a temporary basis, and does not > assume that the reference repository will exist after the operation > completes, so any used objects are copied into the local objects database. > In theory, this mechanism would be distinct from '--reference', so if both > are used, some objects would be copied, and some objects would be accessible > via a reference repository referenced by the alternates file.
Isn't this the same as git clone --reference <path> --no-hardlinks <url> ? Also without --no-hardlinks we're not assuming that the other repo doesn't go away (you could rm-rf it), just that the files won't be *modified*, which Git won't do, but you could manually do with other tools, so the default is to hardlink. > 2) Teach `git fetch` to read 'repocache.path' (or a better-named > configuration), and use it to automatically activate borrowing. So a default path for --reference <path> --no-hardlinks ? > 3) For consistency, `git clone`, `git pull`, and `git submodule update` > should probably all learn '--borrow', and forward it to `git fetch`. > > 4) In some scenarios, it may be necessary to temporarily not automatically > borrow, so `git fetch`, and everything that calls it may need an argument to > do that. > > Intended outcome: With 'repocache.path' set, and the cached repository > properly updated, one could run `git clone <url>`, and the operation would > complete much faster than it does now due to less load on the network. > > Things I haven't figured out yet: > > * What's the best approach to copying the needed objects? It's probably > inefficient to copy individual objects out of pack files one at a time, but > it could be wasteful to copy entire pack files just because you need one > object. Hard-linking could help, but that won't always be available. One of > my previous ideas was to add a '--auto-repack' option to `git-clone`, which > solves this problem better, but introduces some other front-end usability > problems. > * To maintain optimal effectiveness, users would have to regularly run a > fetch in the cache repository. Not all users know how to set up a scheduled > task on their computer, so this might become a maintenance problem for the > user. This kind of problem I think brings into question the viability of the > underlying design here, assuming that the ultimate goal is to clone faster, > with very little or no change in the use of git. > > > Thoughts? > > Thanks, > Andrew Keller > > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html