Re: Repository splitting script

Alex Heneveld Sun, 06 Dec 2015 04:08:04 -0800


Richard-  Awesome job.


Tony-  Thanks for the suggestion.

What Richard is doing is very cool. `git filter-branch --index-filter`efficiently acts on the git index rather than (very expensive) check outeach commit. Which given 10k+ commits and 10k+ files in each commit(not to mention branches!) would take days or weeks.

I've put some comments onhttps://github.com/rdowner/brooklyn-repo-split/pull/1 esp startingtowards a pre-processing step.


--A


On 06/12/2015 00:18, Tony Su wrote:

An interesting project (porting and splitting a git repo including history).

If you haven't found the following article, it might be helpful. Looks
like what he's doing is simpler than your code (although may or may
not be better)

http://gbayer.com/development/moving-files-from-one-git-repository-to-another-preserving-history/

Tony

On Sat, Dec 5, 2015 at 3:46 PM, Richard Downer <[email protected]> wrote:

All,

Per the recent vote on this list, we have decided to split the
Brooklyn repository into a number of smaller modules.

With some of my colleagues, I've been working on a script to do this,
which preserves as much of the existing history, branches and tags as
possible.

You can find the script here:
https://github.com/rdowner/brooklyn-repo-split/tree/master

And the result of running the script:
https://github.com/rdowner/TEMP-brooklyn-dist
https://github.com/rdowner/TEMP-brooklyn-docs
https://github.com/rdowner/TEMP-brooklyn-library
https://github.com/rdowner/TEMP-brooklyn-server
https://github.com/rdowner/TEMP-brooklyn-ui

I'd be interested in your feedback!

--

One limitation of this script occurs when files are moved between
locations that turn into different repositories after the split. My
expectation is that files would suddenly appear in the new repository
without history.

Alex Heneveld has suggested a pre-processing phase which examines all
the files currently in each proposed subrepo, and then examines the
history to determine every filename that they have previously been
known as, and uses that in the whitelist for selecting the contents of
each subrepo. This would preserve history (although the subrepo
history would look a bit odd, as the early history of the repo would
have isolated files in random locations).

I haven't implemented Alex's suggestion, but the design of this script
would easily allow the results of the pre-processing phase to be
integrated.


Cheers
Richard.

Re: Repository splitting script

Reply via email to