I very much support moving to all-submodules. In fact, I argued for all-submodules when we made the half-submodules transition last year. Being able to easily check out a consistent and complete source code tree in a repeatable way is extremely important.
Checking out by date "works" if you have dated history in your git reflog. For example, see: http://stackoverflow.com/questions/6990484/git-checkout-by-date In general, git commits are *not* time ordered, so asking for the version at a particular time is not well-defined across different working repositories. The GHC HQ buildbots dump fingerprints in a form that is usable directly with fingerprint.py. You can get these fingerprints from the ghc-builds@ archive. Unfortunately there was a large gap after MSR moved buildings where our builds did not run, but things are more or less working now. I believe Ben's buildbot package dumps fingerprints in a form that needs to be massaged before fingerprints.py can deal with it. Geoff On 06/05/2013 11:32 AM, Niklas Larsson wrote: > When I was fiddling with having to rollback everything to a known good > state I patched sync-all to checkout all the repos to the state they > were in on a certain date, it's pretty naive, but it should be usable > for doing manual bisecting at least. I can't find the old mailing list > archives, so I attach the patch here. > > Niklas > > > 2013/6/5 Austin Seipp <ase...@pobox.com> > > (Warning: incoming answer, followed by a rant.) > > Base is not a submodule, meaning that there is essentially no way to > automatically check it back out to the "exact same state" it was in, > given some specified GHC commit - the commit IDs are not tracked. > > At this point, you are basically on your own. You'll have to manually > checkout libraries/base to a specific commit that occurred 'around' > the same time as the GHC commit. In this case, that means looking > through whatever commits hit HEAD on May 7th: > > $ cd libraries/base > $ git log --until="May 7th" > > The resulting list will show you what happened up to may 7th. Take the > latest commit in that list, and check out base to that revision. Any > commits afterword happened on may 8th or later: > > $ git checkout -b temporary-io-fix <sha1 of latest May 7th commit> > > You're going to need to do this for every module that is not tracked > as a submodule. Most of the repositories are very low-activity. base & > testsuite are going to be the annoying ones. > > You'll have to continue this 'manual bisection' by hand, with a very > hefty dose of frustrating trial-and-error, in my experience. > > There is a secondary alternative. GHC has a script called > 'fingerprint.py' (in utils/fingerprint/) which is somewhat designed to > work around this deficiency (very poorly.) This script basically dumps > out a text file, containing a key/value pair mapping every repository > to its current HEAD commit. It can then take that text file and > automatically do 'git checkout' for you in every repo. The idea is you > can take fingerprints of the tree, save the results, and cleanly check > out to some state later. > > The GHC build bots run by Ben L.'s "Buildbox" library automatically > runs the 'fingerprint.py' script during nightly-builds, from what I > remember. It may be possible to just look in the ghc-builds archives, > and steal some fingerprints from the last month off one of the > buildbots. I don't know who maintains the individual bots; perhaps you > can ask the list. However, this will at best give you a 1-day level of > granularity, rather than commit level granularity, which is still > rather unsatisfying. > > ------------- Answer over, rant begins. --------------------- > > I know we had this discussion sometime recently I think, but can > someone *please* explain why we are in this situation of half > submodules, half random-floating-git-repository-checkouts? It's > terrible. I'm frankly surprised we've even been doing it this long, > over a year or more? It is literally the worst of submodules, and > free-standing-repositories put together, with none of the advantages > of either. > > Free-standing repos are attractive because they are just there, and > you don't have to 'maintain' them (sort of.) Submodules are attractive > because they identify the critical points in which your repositories > depend on each other. We have neither benefit right now, clearly. > > In particular, this makes it impossible to use tools like 'git bisect' > which is *incredibly* useful for just these exact cases. Hell, you can > even make 'git bisect' work almost 100% automatically with a tiny bit > of shell scripting. > > http://mainisusuallyafunction.blogspot.com/2012/09/tracking-down-unused-variables-with.html > > You could just instead have a script that built the compiler, and ran > the built compiler on your testcase, after every bisection. Wouldn't > it be *great* to have something like that Just Work? A tool like this > could potentially boil down Kazu's bug almost automatically for > example, with little-to-no frustrating intervention. > > And even now, looking at the repository listing of what is in > libraries/, that are not submodules, I really see no reason why more - > or even all - of them cannot be submodules. Is it a workflow issue of > some sort? That's what I'm thinking at this point, but I also don't > think it could be any worse than it is now. > > Realistically, very few libraries GHC needs for bootstrapping seem to > change that much. unix, integer-simple, haskeline and filepath for > example change *extremely* infrequently, but all are free-standing. > Why? In the event they were submodules, would anything actually be > lost? > > The maintainer - that is, not GHC HQ - would still 'own' the official > repository. They can make changes to it. But if there is a necessity > to pull that in for GHC (feature request, bug fix, random thing) it > can be done by updating the submodule pointer to the new commit. But > this must happen explicitly by a GHC committer. In the event they > update the submodule pointer, they should also obviously make sure the > build still works. > > That means we have to update the submodule pointers ourselves if > things change. That sucks I guess, but really, aside from base and > testsuite, the two most frequently changing repositories, is that > *actually* going to cost us a lot of work? > > And even if it does cost us work, I'll speak for myself: I will gladly > pay for that work and do it all myself if it means I can actually > bisect and actually roll back my tree to some point to fix things - > without needing to prepare for it months in advance using hacks. Like > creating thousands of fingerprints, using fingerprint.py every day > when people make commits (no, I haven't done this, but it could be > done, and I really don't want to do it.) > > Long-term reproducible builds are, IMO, a must for any project. > *Especially* a project of our size. *Especially* a compiler of all > things. But as it stands, when you build GHC, you can probably > reproduce *today's* results and *today's* bugs. Last month's results? > Last years? Finding the difference between those months ago and today? > Good luck - you will need it. > > On Tue, Jun 4, 2013 at 8:07 PM, Kazu Yamamoto <k...@iij.ad.jp> wrote: > > Hi, > > > > Andreas and I found that the new IO manager is not working properly in > > the current GHC head. I'm sure that it worked well at least on May 7. > > > > We need to narrow the range of commits, so I did: > > > > % git checkout bb2795db36b36966697c228315ae20767c4a8753 > > % git submodule update > > > > But this does not checkout proper submodules. For instance, > > libraries/base has newer commits. And of cource, building fails. > > > > Please tell us how to checkout proper submodules against a specific > > GHC tree. > > > > --Kazu > > -- > Regards, > Austin - PGP: 4096R/0x91384671 _______________________________________________ ghc-devs mailing list ghc-devs@haskell.org http://www.haskell.org/mailman/listinfo/ghc-devs