On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote:
> Hi,
>
> On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote:
> > Taylor Blau wrote:
>
> >> Right, though this still subjects the remote copy to all of the
> >> difficulty of packing large objects (though Christian's work to support
> >> other object database implementations would go a long way to help this).
> >
> > Ah, interesting -- I didn't realize this step was part of the
> > bottleneck. I presumed git didn't do much more than perhaps gzip'ing
> > binary files when it packed them up. Or do you mean the growing cost
> > of storing the objects locally as you work? Perhaps that could be
> > solved by allowing the client more control (ie. delete the oldest
> > blobs that exist on the server).
>
> John, I believe you are correct.  Taylor, can you elaborate about what
> packing overhead you are referring to?

Jonathan, you are right. I was also referring about the increased time
that Git would spend trying to find good packfile chains with larger,
non-textual objects. I haven't done any hard benchmarking work on this,
so it may be a moot point.

> In other words, using a rolling hash to decide where to split a blob
> and use a tree-like structure so that (1) common portions between
> files can deduplicated and (2) portions can be hashed in parallel.

I think that this is worth discussing further. Certainly, it would go a
good bit of the way to addressing the point that I responded to earlier
in this message.

Thanks,
Taylor

Reply via email to