On Sat, Jan 30, 2016 at 12:41:41AM -0500, Jeff King wrote:
> It looks like this has been broken since cd547b4 (fetch/push: readd
> rsync support, 2007-10-01). The fix is just to ignore packed-refs
> entries which duplicate loose ones. But given the length of time this
> has been broken with nobody complaining, I have to wonder if it is
> simply time to retire the rsync protocol. Even if was made to work, it
> is a horribly inefficient protocol.
I took a look at whether there would be an easy fix. There are three
obvious ways to go about this:
1. Use the loose/packed reading code from refs/files-backend.c.
This would require some refactoring, as we currently assume we are
either reading the refs for _this_ repository, or for a submodule.
This is sort-of like reading a submodule, but I think there are a
few rough edges.
Worse, though, is that the upcoming pluggable refs work will
probably require that submodules and the main repo have the same
ref backend. I'm a little dubious of that requirement in general,
but certainly it would be a show-stopper here.
2. Create a "struct transport" for the tempdir holding the data we
rsynced from the other side, and just treat it like a local repo.
We already do something like this to handle object "alternates"
repositories (and we run "upload-pack" on the other directory and
parse it just like a real remote).
Unfortunately, what we bring over in get_refs_via_pack is not
enough for this to work. It's _just_ the refs/ directories. We can
use "git init" to make it more like a real repo, but ultimately we
don't have any objects, so upload-pack will complain.
We could fix that by just rsyncing the objects down at this stage,
too. It's not like git is careful enough to do a real "what do we
need" walk like it does for dumb-http. But we would end up rsyncing
even in cases where we didn't need _any_ objects, though that is
probably a vast minority case.
3. Just teach the local ad-hoc loose and packed readers to do the
proper deduplication. I started on this, but then realized that we
really do implement a from-scratch packed-refs reader here. And
it's missing some features, like parsing peeled tags.
So it really would want to call into the regular packed-refs
parsing code, which requires more refactoring as in (1).
Of all of these, I think (2) is the closest to sane, because it lets
upload-pack do the heavy-lifting, meaning we can understand whatever
formats we rsync from the other side. But given that rsync is already
naive about what objects it pulls (i.e., it gets everything), I have to
really question whether there is any value in using git-over-rsync
versus just:
rsync $src tmp/
git clone tmp my-repo ;# will hard-link, no extra space needed!
rm -rf $tmp
I guess that doesn't handle subsequent fetches. But
really...git-over-rsync is just an awful protocol. Nobody should be
using it. Having looked at it in more detail, I'm more in favor than
ever of removing it.
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html