On Mon, Apr 15, 2013 at 09:51:34PM +0100, Florian Rathgeber wrote: > On 14/04/13 21:26, Anders Logg wrote: > > On Fri, Apr 12, 2013 at 05:50:25PM +0100, Florian Rathgeber wrote: > >> On 11/04/13 02:53, Florian Rathgeber wrote: > >>> On 10/04/13 15:54, Anders Logg wrote: > >>>> On Wed, Apr 10, 2013 at 12:42:13PM +0100, Florian Rathgeber wrote: > >>>>> On 09/04/13 22:47, Florian Rathgeber wrote: > >>>>>> On 09/04/13 20:14, Anders Logg wrote: > >>>>>>> Another option would be git submodules. Florian suggested this to me > >>>>>>> earlier. > >>>>>> > >>>>>> That's what I think would have been a good option for outsourcing the > >>>>>> references. They are by far the biggest chunk of the FFC repository (in > >>>>>> size) and only developers care about them, while everyone else has a > >>>>>> much larger repository to clone which also takes up considerable disk > >>>>>> space (51M at the moment). > >>>>>> > >>>>>> Having the references be a submodule means the > >>>>>> test/regression/references directory would be a pointer to a particular > >>>>>> revision (SHA1) of another repository. Each FFC revision would have a > >>>>>> particular revision of the ffc-references repository associated with > >>>>>> it, > >>>>>> so there is no ambiguity. It would also have the advantage that if we > >>>>>> would completely redesign the FFC testing infrastructure and wouldn't > >>>>>> need the references any more we could simply get rid of the submodule > >>>>>> and wouldn't have to carry around their burden in history forever. > >>>>>> > >>>>>> There's a few caveats though: > >>>>>> > >>>>>> 1) If we were doing this now we would need to rewrite the history > >>>>>> again, > >>>>>> completely strip the references folder and replace it by the submodule. > >>>>>> > >>>>>> 2) Syncing a git repository over to launchpad for automatic package > >>>>>> building with the bzr builder is not possible if the repository has > >>>>>> *ever* included a submodule in its history [1], but there are > >>>>>> workarounds [2] (which can't be run as a BitBucket hook however). > >>>>>> > >>>>>> 3) Pull requests would be a bit more tricky since ffc-references and > >>>>>> ffc > >>>>>> would have to be always merged as a pair. For core developers with push > >>>>>> access to the repositories this could probably be handled with a > >>>>>> pre-commit hook. > >>>>>> > >>>>>> [1]: https://bugs.launchpad.net/bzr-git/+bug/402814 > >>>>>> [2]: > >>>>>> https://bazaar.launchpad.net/~videolan/vlc/manual-bzr-import/view/head:/manual-bzr-import > >>>>> > >>>>> It appears we can't get anyone excited on a discussion of these issues. > >>>>> Have we scared everyone away? > >>>>> > >>>>> What are your thoughts on the submodule for FFC references? If we decide > >>>>> to rewrite again we should do it asap before people actually start > >>>>> basing work off the new FFC repo. > >>>> > >>>> I think we should rewrite now and do the submodule thing. Then the the > >>>> references won't clutter the history and we are free to later move > >>>> them somewhere else (like automatic CMake fetch if we decide to do > >>>> that). > >>> > >>> I've done some research and there seem to be some options for splicing a > >>> subdirectory into a submodule while keeping the correct associating > >>> throughout history i.e. every revision of the main repo points to the > >>> correct revision of the submodule: > >>> http://thread.gmane.org/gmane.comp.version-control.git/109805/ > >> > >> Couldn't get this working even after some fiddling. > >> > >>> http://thread.gmane.org/gmane.comp.version-control.git/164489/ > >>> http://thread.gmane.org/gmane.comp.version-control.git/164463/ > >> > >> The full thread is at > >> http://thread.gmane.org/gmane.comp.version-control.git/164386/ > >> > >> I could get this to work, and it seems to do pretty much what we want: > >> splits the subdirectory into a submodule (within the same repository!) > >> and maintains the correct association by storing the submodule revision > >> in the parent's index. It does however not create (and update) a > >> .gitmodules files, so you have to know where the submodule is linked to > >> the parent and it's slightly awkward put it in place: > >> > >> $ git clone . test/regression/references > >> $ rev=`git rev-parse :test/regression/references` > >> $ ( cd test/regression/references && git reset --hard $rev ) > >> > >> However it should be possible to add a .gitmodules file and then treat > >> it in the normal way. To be able to push/pull the submodule tree it's > >> also necessary to create a ref to it e.g.: > >> > >> $ git update-ref refs/test/references <sha> > >> > >> Note that this is deliberately *not* a branch ref (which live in > >> refs/heads/), which means it won't be fetched by default. That means > >> even though the references tree is in the repository, users don't invest > >> the bandwidth to fetch it unless they explicitly configure it to (which > >> developers who want to run regression tests will need to do). > > > > ok. I honestly can't follow the technical details here... > > The important points are: > 1) FFC and the references are stored in one and the same repository > 2) the references are not fetched (transferred) by default > 3) from each FFC revision throughout history its associated references > are reachable > 4) checking out the references to test/regression/references is slightly > awkward, but can easily be scripted; same for updating
ok. > >>> Regarding the caveats from above: we're willing to accept 1), 2) I think > >>> is not a big deal (I'm not even sure Johannes is using bzr builder?), so > >>> the main thing is 3). Given that the history of the references isn't > >>> really important only the association it's maybe not so scary. It's just > >>> a bit more work maintaining 2 repositories, though most of it could be > >>> scripted, at least for the benefit of the core devs. > >>> > >>> I've had another chat with Jed and he suggested using git-fat. He's the > >>> author and it was specifically written for that use case: keeping a > >>> unified repository/history but storing large (optional) files outside of > >>> .git/objects to keep the repository slim. The downside is that you then > >>> need a separate central location where these files are kept. git-fat > >>> manages them for you, so running an rsync daemon on the FEniCS web > >>> server might already do the trick. > >> > >> After a closer look at git-fat I think it's not perfect for our use > >> case: the actual files on disk are only stubs (which only contain the 40 > >> byte SHA1) and are replace by the actual big blobs by a smudge/clean > >> filter, but *only for certain operations*. Unfortunately diff is not one > >> of them and I think it's the one we care about: being able to view the > >> diff between the output and the old reference before updating. If we > >> don't care about the diff we could just as well only store a hash of the > >> reference. > > > > We need to be able to view the diff - if something changed, we must be > > able to spot if it is a harmless formatting fix. > > > > And note that it's not just the text diffs that are important. We also > > store data that come out of running the generated code. That is also > > stored and then checksums aren't enough. > > OK. I think that rules out git-fat. ok. > >>> We then went on to discuss whether we could in fact leverage git in the > >>> regression test suite itself: there is no inherent reason why the > >>> references actually need to exist as files in the work tree. An > >>> identifiable loose object in the repository would be sufficient. I'll > >>> forward the log so you can get the idea. > >> > >> Are there any plans for changing the FFC testing infrastructure? > > > > Martin has been doing some work on using .json for storing the > > reference data. > > I thought the reference data was an addition to testing the generated > headers? Yes, we do both. Both the code itself and the output from running the code are tested. > > Considering that there doesn't seem to be a perfect git solution for > > storing the references at this point, my suggestion would be to store > > the references on the web server with rsync and a small bash script > > that will download (and upload) the appropriate references. The script > > would look for data in a directory named with the git hash of the > > youngest available ancestor. > > The submodule solution isn't perfect, but it has the main advantage that > code and references are stored in the same place. The rsync solution you > describe seems feasible, but introduces another place where data is kept. It also has the advantage that it's something I can slap together in a simple bash script. I don't know enough about git to handle it using the submodule approach, and I don't know if I should ask you to spend another 2 weeks developing the script(s) for it. :-) -- Anders _______________________________________________ Mailing list: https://launchpad.net/~fenics Post to : fenics@lists.launchpad.net Unsubscribe : https://launchpad.net/~fenics More help : https://help.launchpad.net/ListHelp