On Mon, Apr 28, 2014 at 09:43:10AM -0700, Junio C Hamano wrote:
> Yes, I'd love to see something along that line in the longer term,
> showing all the objects as just regular objects under the hood, with
> implementation details hidden in the object layer (just like there
> is no distinction between packed and loose objects from the point of
> view of read_sha1_file() users), as a real solution to address
> issues in larger trees.
> Also see http://thread.gmane.org/gmane.comp.version-control.git/241940
> where Shawn had an interesting experiment.
Yeah, I think it's pretty clear that a naive high-latency object store
is unusably slow. You mentioned in that thread trying to do pre-fetching
based on commits/trees, and I recall that Shawn's Cassandra experiments
did that (and maybe the BigTable-backed Google Code does, too?).
There's also a question of deltas. You don't want to get trees or text
blobs individually without deltas, because your total size ends up way
But I think for large object support, we can side-step the issue. The
objects will all be blobs (so they cannot refer to anything else), they
will typically not delta well, and the connection setup and latency will
be dwarfed by actual transfer time. My plan was to have all clones fetch
all commits and trees (and small blobs, too), and then download and
cache the large blobs as-needed.
That doesn't help with repositories where the actual commit history or
tree size is a problem. But we already have shallow clones to help with
the former. And for the latter, I think we would want a narrow clone
that behaves differently than what I described above. You'd probably
want a specific "widen" operation that would fetch all of the objects
for the newly-widened part of the tree in one go (including deltas), and
you wouldn't want it to happen on an as-needed basis.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html