On Thursday, November 7, 2013 11:14:31 PM UTC+1, Gergely Polonkai wrote:
> I know it was on topic several times before, but today this problem also
> came to me.
> I have a project tracked by git, which contained source code for both a
> binary and a library. The library had several relatively large (~30 MB
> each, ~400 MX total) data files, which were generated by an external
> program, and will never be modified later.
> As the project matured, it turned out that the library can be a standalone
> product, so we have moved data/ into lib-src/, and “exported” that
> directory to a separate repository with git-subtree. We don’t plan to merge
> them together any more. This, however, made the checkout of the binary’s
> repository a bit slow, as the index still holds these files.
> Some additional information: before the subtree operation, the binary was
> more like a test suite for the library, only the last few commits may
> contain relevant (e.g. code that is still in the binary) changes to both
> the binary and the lib.
> My question is: is it a good idea to remove these now quasi-unused files
> from the index?
Version control systems track every version of very file that was ever in
the repository. If you want to shrink the repository by removing stuff, you
have to go through history and erase all notions that the file ever
existed, in order to actually win back some space.
In many cases, the large files should never have been added in the first
place, so it's safe to remove them from history. In other cases, the files
might have been important for the consistency of things at a certain time,
and keeping them in history has value.
It's hard to say what aspects of history is important for you to keep from
an external point of view.
Once you decide that you want to remove files from history, I do recommend
trying out the BFG repo cleaner, as it is quite user-friendly, and does the
normal use-cases really well: http://rtyley.github.io/bfg-repo-cleaner/
If that doesn't get you where you want, the standard Git tool for such
things is git filter-branch in its various modes and
(google for more examples and how-to's).
Note that once you rewrite history, collaborators will have to re-clone the
newly rewritten repository. Also make sure to take backups before you start
You received this message because you are subscribed to the Google Groups "Git
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/groups/opt_out.