On Mon, Sep 02, 2013 at 10:05:07AM +0700, Nguyen Thai Ngoc Duy wrote:

> Current code peaks into the transfered pack's header, if the number of
> objects is under a limit, unpack-objects is called to handle the rest,
> otherwise index-pack is. This patch makes fetch-pack use index-pack
> unconditionally, then turn objects loose and remove the pack at the
> end. unpack-objects is deprecated and may be removed in future.

I do like consolidating the object-receiving code paths, but there is a
downside to this strategy: we increase the I/O in cases where we end up
unpacking, as we spool the tmpfile to disk, and then force objects loose
(whereas with the current code, unpack-objects reads straight from the
network into loose objects). I think that is what you're saying here:

>  - by going through index-pack first, then unpack, we pay extra cost
>    for completing a thin pack into a full one. But compared to fetch's
>    total time, it should not be noticeable because unpack-objects is
>    only called when the pack contains a small number of objects.

...but the cost is paid by total pack size, not number of objects. So if
I am pushing up a commit with a large uncompressible blob, I've
effectively doubled my disk I/O. It would make more sense to me for
index-pack to learn command-line options specifying the limits, and then
to operate on the pack as it streams in. E.g., to decide after seeing
the header to unpack rather than index, or to drop large blobs from the
pack (and put them in their own pack directly) as we are streaming into
it (we do not know the blob size ahead of time, but we can make a good
guess if it has a large on-disk size in the pack).

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to