9 hours ago, Matthew Flatt wrote: > At Wed, 22 May 2013 14:50:41 -0400, Eli Barzilay wrote: > > That's true, but the downside of changing the structure and having > > files and directories move post structure change will completely > > destroy the relevant edit history of the files, since it will not > > be carried over to the repos once it's split. > > It's possible that we're talking past each other due to me not getting > this point.
(Obligatory re-disclaimer: I consider the problem with forcing people to change their working environment much more severe.) > Why is it not possible to carry over history? > > The history I want corresponds to `git log --follow' on each of the > files that end up in a repository. I'm pretty sure that such a > history of commits can be generated for any given set of files, even > if no ready-made tool exists already (i.e., 'git' is plenty flexible > that I can script it myself). > > Or maybe I'm missing some larger reason? The thing to remember is just how simple git is... There's no magical way to carry over a history artificially -- it's whatever is in the commits. To make this more concrete (and more verbose), in this context the point is that git filter-branch is a simple tool that basically replays the complete history, allowing you to plant various hooks to change the directory structure, commit messages or whatever. The new history is whatever new commits are in the revised repository, with no way to make up a history with anything else. Now, to make my first point about the potential loss of history that is inherent in the process -- say that you want to split out a "drracket" repo in a naive way: taking just that one directory. Since it's done naively, the resulting repository will not have the "drscheme" directory and its contents, which means that you lose all history of files that happened there. To try that (in a fresh clone, of course) -- first, look at the history of a random file in it: F=collects/drracket/private/app.rkt git log --format='----%n%h %s' --name-only --follow -- "$F" Now do the revision: S=collects/drracket git filter-branch --prune-empty --subdirectory-filter "$S" -- --all And look at the same log line again, the history is gone: git log --format='----%n%h %s' --name-only --follow -- "$F" If you look at the *new* file, you do see the history, but the revisions made in "drscheme" are gone: git log --format='----%n%h %s' --name-only --follow -- private/app.rkt In any case, this danger is there no matter what, especially in our case since code has been moving around in the "racket" switch. I *hope* that most of it will be simple: like carrying along the "drscheme" directory with "drracket", the "scheme" and "mzlib" with "racket", etc. Later on, if these things move to "compat" packages, the irrelevant directories get removed from the repo without surgeries, so the history will still be there. This shows some of the tricks that might be involved in the current switch: if you'd want to have some "compat" package *now*, the right thing to do would be: * do a simple filter-branch to extract "drscheme" (and other such collections) in a new repository for "compat" * for "drracket": do a filter-branch that keeps *both* directories in, then commit a removal of "drscheme". (Optionally, use rebase to move the deletion backward...) Going back to the repo structure change that you want and the reason that I said that doing moves between the package directories post-restructure is destructive should be clear now: say that you move collects/A/x into foo/A/x as part of the restructure. Later you realize that A/x should go into the bar package instead so you just move it to bar/A/x. The history is now in, including the rename, but later on when bar is split into a separate repo, the history of the file is gone. Instead, it appears in the foo repository, ending up being deleted. One way to get around this is to avoid moving the file -- instead, do another filter-branch surgery. This will be a mess since each such change will mean rebuilding the repository with all the pain that this implies. Another way to get around it is to keep track of these moving commits, and when the time comes to split into package repos, you first do another surgery on the whole repo which moves foo/A/x to bar/A/x for all of the commits before the move (not after, since that could lead to other problems), and then do the split. This might work, but besides being very error-prone, it means doing the same kind of file-movement tracking that I'm talking about anyway. So take this all as saying that the movement of files between packages needs to be tracked anyway -- but with my suggestion the movement is delayed until it's known to be final before the repo split, which makes it more robust overall. ---- But really, the much more tempting aspect for me is that this can be done now -- if you give me a list of packages and files, I can already do the movement script. Actually, in an attempt to tempt you more, here's what I can do now (as in the very near future): Start from the list of directories/files in your min repo as a specification of the contents of the core package, and decide that everything else is in another "everything-else" package. (Since there's no actual file movements, it is cheap to use temporary names and partial specifications.) Then, change how the build works on the main machine (leave the other machines as is for now): after the initial few steps of updating version files etc the script doesn't use a repo -- it uses just the exported directory. So after it exports the directory for building, the main machine will: - run the script to get the package directories, so you get something like (in $PLTHOME, whereever the build works): collects \ doc \ all of these man / are empty src / core/collects core/man core/src everything-else/collects - it now moves core/* up a level (and removes the empty "core" directory) - do the regular build: executables + raco setup - next, move everything-else/* up a level too - run another setup This means that now the build makes sure that the dependencies are fine: that the core doesn't depend on everything-else. Later on, we can split another package out from everything-else, and insert it into the above sequence: build the core, add P, run setup, add everything else, run a final setup. It can even get more sophisticated: - build core, - add P1, setup, move the built P1 out, - add P2, setup, move the built P2 out, - add everything-else and the built P1 & P2, run a final setup Yes, this is duplicating the dependency info between the packages, but this is all done temporarily (and for a small number of packages) until the proper package-based build is working and replaces it. In other words -- not only is my suggestion implementable now, it allows the project to proceed faster: you can go on with doing the package build, while everyone need to deal with respecting dependencies (deciding on which package a file goes with, avoiding breaking these dependencies). -- ((lambda (x) (x x)) (lambda (x) (x x))) Eli Barzilay: http://barzilay.org/ Maze is Life! _________________________ Racket Developers list: http://lists.racket-lang.org/dev