On Sun, Sep 23, 2018 at 9:05 AM Lars Schneider <larsxschnei...@gmail.com> wrote:
> I recently had to purge files from large Git repos (many files, many commits).
> The usual recommendation is to use `git filter-branch --index-filter` to purge
> files. However, this is *very* slow for large repos (e.g. it takes 45min to
> remove the `builtin` directory from git core). I realized that I can remove
> files *way* faster by exporting the repo, removing the file references,
> and then importing the repo (see Perl script below, it takes ~30sec to remove
> the `builtin` directory from git core). Do you see any problem with this
> approach?

A couple comments:

For purging files from a history, take a look at BFG[1] which bills
itself as "a simpler, faster alternative to git-filter-branch for
cleansing bad data out of your Git repository history".

The approach of exporting to a fast-import stream, modifying the
stream, and re-importing is quite reasonable. However, rather than
re-inventing, take a look at reposurgeon[2], which allows you to do
major surgery on fast-import streams. Not only can it purge files from
a repository, but it can slice, dice, puree, and saute pretty much any
attribute of a repository.

[1]: https://rtyley.github.io/bfg-repo-cleaner/
[2]: http://www.catb.org/esr/reposurgeon/

Reply via email to