Stefan Beller wrote:
> On Wed, Feb 10, 2016 at 12:11 PM, Shawn Pearce <[email protected]> wrote:
>> Several of us at $DAY_JOB talked about this more today and thought a
>> variation makes more sense:
>>
>> 1. Clients attempting clone ask for /info/refs?service=git-upload-pack
>> like they do today.
>>
>> 2. Servers that support resumable clone include a "resumable"
>> capability in the advertisement.
>
> like "resumable-token=hash" similar to a push cert advertisement?
It could just be the string 'resumable'.
But I wonder if it would be possible to save a round-trip by getting the
302 response in the initial request. If the client requests
/info/refs?service=git-upload-pack&want_resumable=true
then allow the server to make a 302 in response to its current mostly
whole pack. Current clients would never send such a request because the
current protocol requires that for smart clients
The request MUST contain exactly one query parameter,
`service=$servicename`, where `$servicename` MUST be the service
name the client wishes to contact to complete the operation.
The request MUST NOT contain additional query parameters.
Current http-backend ignores extra query parameters. I haven't
checked other smart http server implementations, though.
>> 3. Updated clients on clone request GET
>> /info/refs?service=git-resumable-clone.
>
> Or just in the non-http case, they would terminate after the ls-remote
> (including capability advertisement) was done and connect again to
> a different service such as git-upload-stale-pack with the resumable
> token to identify the pack.
HTTP supports range requests and existing CDNs speak HTTP, so I
suspect it would work better if the git-resumable-clone service
printed an HTTP URL from which to grab the packfile.
I think the details are something that could be figured out after
trying out the idea with http first, though.
[...]
>> 5. Clients fetch the file using standard HTTP GET, possibly with
>> byte-ranges to resume.
>
> In the non-http case the git-upload-stale-pack would be rsync with the
> resume token to determine the file name of the pack,
> such that we have resumeability.
How do I tunnel rsync over git protocol?
So I think in the non-http case the git-resumable-clone service would
have to print a URL to be served using a possibly different protocol
(e.g., a signed https URL for getting the file from a service like S3,
or an rsync URL for getting the file using the same ssh creds that
were used for the initial request).
[...]
>> 6. Once stored and indexed with .idx, clients run `git fsck
>> --lost-found` to discover the roots of the pack it downloaded. These
>> are saved as temporary references.
>
> jrn:
> > I suspect we can do even faster by making index-pack do the work
>
> index-pack --check-self-contained-and-connected
--strict + --check-self-contained-and-connected check that the pack
is self-contained. In the process they mark each object that is
reachable from another object in the pack with FLAG_LINK.
The objects not marked with FLAG_LINK are the roots.
[...]
>> To make step 4 really resume well, clients may need to save the first
>> Location header it gets back from
>> /info/refs?service=git-resumable-clone and use that on resume. Servers
>> are likely to embed the pack SHA-1 in the Location header, and the
>> client wants to use this on subsequent GET attempts to abort early if
>> the server has deleted the pack the client is trying to obtain.
Yes.
I really like this design. I'm tempted to implement it (since it
lacks a bunch of the downsides of clone.bundle).
Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html