Jeff King <p...@peff.net> writes:

> The vast majority of blobs in git.git will be stored as packed deltas.
> That means the streaming code will fall back to doing the regular
> in-core access. We _could_ therefore use that in-core copy to do our
> sha1 check rather than streaming; but of course we never get access to
> it outside of stream_blob_to_fd, and it is discarded. However, we do
> keep a copy in the delta base cache. When we immediately ask to unpack
> the exact same entry for check_sha1_signature, we can pull the copy
> straight out of the cache without having to re-inflate the object.

OK, that explains the overhead of 20% that is lower than one would
naïvely expect.  Thanks.

> Yes, I think it is a reasonable addition to the streaming API. However,
> I do not think there are any callsites which would currently want it.
> All of the current users of stream_blob_to_fd use read_sha1_file as
> their alternative, and not parse_object. So we are not verifying the
> sha1 in either case (we may want to change that, of course, but that is
> a bigger decision than just trying to bring streaming and non-streaming
> code-paths into parity).

True. I am not offhand sure if we want to make read_sha1_file() to
rehash, but I agree that it is a question different from what we are
asking in this discussion.

> I also wondered if parse_object itself had problems with double-reading
> or failing to verify. But its use goes the opposite direction; it wants
> to verify the sha1 of the blob object, but it knows that it does not
> actually need the data. So it streams (as of 090ea12) to check the
> signature, but then discards each buffer-full after hashing it.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to