On Wed, 16 Jul 2014 05:21:50 -0700 (PDT) Dominik Rauch <[email protected]> wrote:
[...] > Main question: > The existence of tools like git-annex, git-fat, git-media, etc. hints > that Git has problems with binary files in some way. Although I've > studied as much internal docs as I could find, I could not find a > clue why Git should handle binary files any worse than Subversion > did. - Yes the repository size may get huge, however, initial cloning > is a one-time process and does not affect our company too much. > > Does Git even have a problem with binary files? What's the problem > exactly? How does Subversion handle this in a better way? Is it about > single files which are very huge (e.g. 3D models with more than 500MB > file size) or are many small binary files a problem as well? Is it > about initial cloning time only or does it affect the everyday work > (committing, branching, etc.) as well? As I perceive this, only these problems exist with binary files in Git: 1) Git always compresses objects it writes; and after a certain threshold it compacts "loose" object files into the so-called "packfiles" which are big indexed archives. What matters is that all these [de]compression operations are performed "in core" -- that is, a file is slurped in, operated upon then written out. So you ought to have enough free physical memory to do all of that. 2) There's no way to sensibly diff binary files. Note that this is not specific to DVCS or Git in particular. 3) A DVCS system, which Git is an instance of, does not typically support "locking" files. This is simply because in the DVCS model there's no single "authority" which would hold such information *and enforce the policy* based on it. And while the DVCS model rocks for collaborating on *mergeable* bits of information, for opaque binary files (see (2) above), this usually sucks. Certain front-ends like gitolite do support this [2] but obviously this is clunky and requires a policy of always using a centralized repository as a rendez-vouz point for all the development which might be suboptimal. > Note: we're using Git on Windows, if that's important in any way. It might be important because binary installers GfW publishes are 32-bit only; the virtual memory size of a 32-bit application on any version of Windows is capped at 3GiB, and the first 1GiB of it is mapped to the kernel, so realistically such application might only allocate 2GiB of physical memory, and thanks to memory fragmentation, the biggest contiguous memory chunk any real-world application might allocate under these conditions is less than that value and it only detectable at runtime (for instance, C's malloc() returns NULL). What this means, that memory-hungry operations like compressing loose objects and creating pack-files might potentially hit that memory-limit. [...] >From your problem description it seems that you should be fine using Git if you do not require locking of files. On the other hand, you seem to have fallen into the usual pitfall of wanting someone else to just look at your requirement and somehow know will it work or not. In reality, that's not how things work: as with performance optimization, bottlenecks really occur in places where you do not expect them to appear. So the advise is: test! Create several test repositories, populate them with typical data, use Git on them on systems similar to those you'll use in production. Scripting creation of a test repository with the required number of files of required size and making and committing artifical changes in them is not that hard after all. > (PS: I've asked similar questions on non-official forums a few weeks > ago and haven't got any satisfying answers) This mailing list is non-official as well. The only one which is official is the main Git list which is frequented by Git developers. Please refer to [1] on how to get there. 1. https://gist.github.com/tfnico/4441562 2. http://gitolite.com/gitolite/locking.html -- You received this message because you are subscribed to the Google Groups "Git for human beings" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
