On Fri, Oct 05, 2018 at 06:44:25PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Some version of the former. Ones where we haven't found any (or much of)
> useful deltas yet. E.g. say I had a repository with a lot of files
> generated by this command at various points in the history:
>
> dd if=/dev/urandom of=file.binary count=1024 bs=1024
>
> Some script similar to git-sizer which could report that the
> packed+compressed+delta'd version of the 10 *.binary files I had in my
> history had a 1:1 ratio of how large they were in .git, v.s. how large
> the sum of each file retrieved by "git show" was (i.e. uncompressed,
> un-delta'd).
You can get the uncompressed and on-disk sizes with:
git cat-file --batch-all-objects \
--batch-check='%(objectname) %(objectsize) %(objectsize:disk)'
and then compare the sizes/ratios however you like. If you want just a
subset of the blobs, drop the "--batch-all-objects" and just feed the
object names or even "HEAD:filename" on stdin).
> That doesn't mean that tomorrow I won't commit 10 new objects which
> would have a really good delta ratio to those 10 existing files,
> bringing the ratio to ~1:2, but if I had some report like:
>
> <ratio> <extension>
>
> For a given repo that could be fed into .gitattributes to say we
> shouldn't bother to delta files of certain extensions.
I don't know of a tool that does that, but I think a modest application
of perl to the cat-file output would produce it.
-Peff