On Thu, Mar 20, 2014 at 5:11 AM, Junio C Hamano <gits...@pobox.com> wrote:
> David Kastrup <d...@gnu.org> writes:
>> Junio C Hamano <gits...@pobox.com> writes:
>>> David Kastrup <d...@gnu.org> writes:
>>>> The default of 16MiB causes serious thrashing for large delta chains
>>>> combined with large files.
>>>> Signed-off-by: David Kastrup <d...@gnu.org>
>>> Is that a good argument? Wouldn't the default of 128MiB burden
>>> smaller machines with bloated processes?
>> The default file size before Git forgets about delta compression is
>> 512MiB. Unpacking 500MiB files with 16MiB of delta storage is going to
>> be uglier.
>> Documentation/config.txt states:
>> Maximum number of bytes to reserve for caching base objects
>> that may be referenced by multiple deltified objects. By
>> storing the
>> entire decompressed base objects in a cache Git is able
>> to avoid unpacking and decompressing frequently used base
>> objects multiple times.
>> Default is 16 MiB on all platforms. This should be reasonable
>> for all users/operating systems, except on the largest projects.
>> You probably do not need to adjust this value.
>> I've seen this seriously screwing performance in several projects of
>> mine that don't really count as "largest projects".
>> So the description in combination with the current setting is clearly wrong.
> That is a good material for proposed log message, and I think you
> are onto something here.
> I know that the 512MiB default for the bitFileThreashold (aka
> "forget about delta compression") came out of thin air. It was just
> "1GB is always too huge for anybody, so let's cut it in half and
> declare that value the initial version of a sane threashold",
> nothing more.
> So it might be that the problem is 512MiB is still too big, relative
> to the 16MiB of delta base cache, and the former may be what needs
> to be tweaked. If a blob close to but below 512MiB is a problem for
> 16MiB delta base cache, it would still be too big to cause the same
> problem for 128MiB delta base cache---it would evict all the other
> objects and then end up not being able to fit in the limit itself,
> busting the limit immediately, no?
> I would understand if the change were to update the definition of
> deltaBaseCacheLimit and link it to the value of bigFileThreashold,
> for example. With the presented discussion, I am still not sure if
> we can say that bumping deltaBaseCacheLimit is the right solution to
> the "description with the current setting is clearly wrong" (which
> is a real issue).
I vote make big_file_threshold smaller. 512MB is already unfriendly
for many smaller machines. I'm thinking somewhere around 32MB-64MB
(and maybe increase delta cache base limit to match). The only
downside I see is large blobs will be packed undeltified, which could
increase pack size if you have lots of them. But maybe we could
improve pack-objects/repack/gc to deltify large blobs anyway if
they're old enough.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html