Re: [git-users] Git and binary files ... once more

Konstantin Khomoutov Wed, 16 Jul 2014 07:07:38 -0700

On Wed, 16 Jul 2014 05:21:50 -0700 (PDT)
Dominik Rauch <[email protected]> wrote:


[...]

> Main question:
> The existence of tools like git-annex, git-fat, git-media, etc. hints
> that Git has problems with binary files in some way. Although I've
> studied as much internal docs as I could find, I could not find a
> clue why Git should handle binary files any worse than Subversion
> did. - Yes the repository size may get huge, however, initial cloning
> is a one-time process and does not affect our company too much.
> 
> Does Git even have a problem with binary files? What's the problem
> exactly? How does Subversion handle this in a better way? Is it about
> single files which are very huge (e.g. 3D models with more than 500MB
> file size) or are many small binary files a problem as well? Is it
> about initial cloning time only or does it affect the everyday work
> (committing, branching, etc.) as well?

As I perceive this, only these problems exist with binary files in Git:

1) Git always compresses objects it writes; and after a certain
   threshold it compacts "loose" object files into the so-called
   "packfiles" which are big indexed archives.

   What matters is that all these [de]compression operations are
   performed "in core" -- that is, a file is slurped in, operated upon
   then written out.  So you ought to have enough free physical memory
   to do all of that.

2) There's no way to sensibly diff binary files.

   Note that this is not specific to DVCS or Git in particular.

3) A DVCS system, which Git is an instance of, does not typically
   support "locking" files.  This is simply because in the DVCS model
   there's no single "authority" which would hold such information
   *and enforce the policy* based on it.

   And while the DVCS model rocks for collaborating on *mergeable*
   bits of information, for opaque binary files (see (2) above),
   this usually sucks.

   Certain front-ends like gitolite do support this [2] but obviously
   this is clunky and requires a policy of always using a centralized
   repository as a rendez-vouz point for all the development
   which might be suboptimal.

> Note: we're using Git on Windows, if that's important in any way.

It might be important because binary installers GfW publishes are
32-bit only; the virtual memory size of a 32-bit application on any
version of Windows is capped at 3GiB, and the first 1GiB of it is
mapped to the kernel, so realistically such application might only
allocate 2GiB of physical memory, and thanks to memory fragmentation,
the biggest contiguous memory chunk any real-world application might
allocate under these conditions is less than that value and it only
detectable at runtime (for instance, C's malloc() returns NULL).

What this means, that memory-hungry operations like compressing loose
objects and creating pack-files might potentially hit that memory-limit.

[...]

>From your problem description it seems that you should be fine using
Git if you do not require locking of files.

On the other hand, you seem to have fallen into the usual pitfall of
wanting someone else to just look at your requirement and somehow know
will it work or not.  In reality, that's not how things work: as with
performance optimization, bottlenecks really occur in places where you
do not expect them to appear.  So the advise is: test!  Create several
test repositories, populate them with typical data, use Git on them on
systems similar to those you'll use in production.

Scripting creation of a test repository with the required number of
files of required size and making and committing artifical changes in
them is not that hard after all.

> (PS: I've asked similar questions on non-official forums a few weeks
> ago and haven't got any satisfying answers)

This mailing list is non-official as well. The only one which is
official is the main Git list which is frequented by Git developers.
Please refer to [1] on how to get there.

1. https://gist.github.com/tfnico/4441562
2. http://gitolite.com/gitolite/locking.html

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [git-users] Git and binary files ... once more

Reply via email to