On Mon, Mar 25, 2013 at 11:32:35AM +0000, Florian Rathgeber wrote: > On 25/03/13 10:50, Garth N. Wells wrote: > > On 25 March 2013 08:31, Florian Rathgeber > > <florian.rathge...@gmail.com> wrote: > >> On 22/03/13 09:59, Johan Hake wrote: > >>> On 03/22/2013 10:57 AM, Anders Logg wrote: > >>>> On Fri, Mar 22, 2013 at 10:52:25AM +0100, Johan Hake wrote: > >>>>> On 03/22/2013 10:36 AM, Anders Logg wrote: > >>>>>> On Fri, Mar 22, 2013 at 10:32:50AM +0100, Johan Hake > >>>>>> wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> Not exactly: > >>>>>>>> > >>>>>>>> - Meshes in demos --> remove (already done) > >>>>>>> I suggest we keep these. There aren't any big files > >>>>>>> anyhow, are there? > >>>>>> > >>>>>> They have already been removed and there's a good system > >>>>>> in place for handling them. Keeping the meshes elsewhere > >>>>>> will encourage use of the mesh gallery and keeping better > >>>>>> track of which meshes to use. There were lots of meshes > >>>>>> named 'mesh.xml' or 'mesh2d.xml' which were really copies > >>>>>> of other meshes used in other demos, some of them were > >>>>>> gzipped, some not etc. That's all very clean now. Take a > >>>>>> look at how it's done in trunk. I think it looks quite > >>>>>> nice. > >>>>> > >>>>> Nice and clean, but it really is just 30 meshes. > >>>>> Duplications are mostly related to dolfin_fine.xml.gz, > >>>>> which there are 7 copies of, and that file is 86K. > >> > >> If they're bit-by-bit identical git will only store a single copy > >> in the repository anyway, regardless of how many copies you > >> happen to have in the working tree. > > > > Clever. > > > >> On the note of storing gzipped meshes: Do they change > >> frequently? > > > > No. > > > >> Why are they stored gzipped? > > > > Habit. It's not good for version control. > > With a bit of trickery we might even be able to convert all those > gzipped meshes i.e. unzip them in each revision and add only keep the > xml in the repo (retrospectively for the entire history). > > >> Compressed files have a few issues: 1) they're treated as binary > >> i.e. any change requires a new copy of the entire file to be > >> stored 2) they can't be diffed 3) git compresses its packfiles > >> anyway, so there is little (if any) space gain through > >> compression > >> > >>>>>> Most of the example meshes are not that big, but > >>>>>> multiply that by 30 and then some when meshes are moved > >>>>>> around or renamed. > >>>>> > >>>>> I just question if it is worth it. Seems convenient to > >>>>> just have the meshes there. > >>>> > >>>> Keeping the meshes there will put a limit on which demos we > >>>> can add. I think it would be good to allow for more complex > >>>> demos requiring bigger meshes (not necessarily run on the > >>>> buildbot every day). > >>> > >>> Ok. > >>> > >>>>> If we keep them out of the repo I think we should include > >>>>> some automagic downloading when building the demos. > >>>> > >>>> Yes, or at least a message stating: "You have not downloaded > >>>> demo data. Please run the script foo." > >>>> > >>>>> Also should we rename the script to download-demo-meshes, > >>>>> or something more descriptive, as this is what that script > >>>>> now basically does? > >>>> > >>>> It is not only meshes, but also markers and velocity fields. > >>>> Perhaps it can be renamed download-demo-data? > >>> > >>> Sounds good. > >>> > >>> Johan > >> > >> I did some more experimenting: > >> > >> 1) Repository size: there is quite some mileage repacking the > >> repos with the following steps: $ git reflog expire --expire=now > >> --all > > git keeps track of how branch HEADs move and does not garbage collect > these revision. This information is kept for 90 days by default. Tell > git to clear this history and "release" if for garbage collection. > > >> $ git gc --aggressive --prune=now > > Invoke git's garbage collection and tell it to aggressively remove all > objects from packfiles which are no longer reachable in the DAG. > > >> $ git repack -ad > > Rewrite the packfiles and remove all redundant packs.
Wow. I didn't much of that but it sounds like a good thing... > >> e.g. DOLFIN: 372MiB -> 94MiB > > > > Wow. What do these commands do? > > > >> 2) Stripping out the files suggested by Anders > >> (https://gist.github.com/alogg/5213171#file-files_to_strip-txt) > >> brings the repo size down to 172MiB and 24MiB after repacking. > > > > I like this. It will make cloning on slow connection much better. Trimming the repository down to 24 MB sounds very tempting. > >> 3) I haven't yet found a reliable way to migrate feature branches > >> to the filtered repository. Filtering the repository rewrites its > >> history and therefore changes/invalidates all commit ids (SHA1s) > >> and therefore the marks files created when initially converting > >> the repository. There are 2 possible options for filtering the > >> repository during conversion: > >> > >> a) bzr fast-import-filter: seems to be a pain to use with many > >> files (need to pass each path individually as an argument) and > >> seems not to support writing marks files, therefore haven't > >> tried. > >> > >> b) git_fast_filter: when using to filter the converted git repo, > >> the exported marks file in the last step contains 83932 marks > >> instead of the expected 14399 - I can't say why. Unfortunately I > >> haven't been able to use it directory in the conversion pipeline, > >> it's not compatible to a bzr fast-export stream. That's probably > >> fixable, but I can't estimate how much work it would be to fix it > >> since I'm not familiar enough with details of the fast-import > >> format. > >> > >> TL;DR: Repacking repos saves a lot of space already without > >> stripping large files. Stripping files is easy to do and saves > >> even considerably more space, but I haven't been able to reliably > >> import feature branches into a filtered repository. > > > > How about we give everyone a periodic within which to merge code > > on Launchpad, then we don't worry about features branches and marks > > in the conversion? Small changes can always come later in the form > > of patches. > > Yes, that's an option. Git has very good support for importing patch > series, maybe bzr can export patch series in the git am format. The > other alternative is importing the feature branch into the > non-filtered git repository and transplant it to the filtered one via > interactive rebase. It's just a bit more work than what I would have > hoped for. I like Garth's suggestion. How about we set a deadline for Friday for any pending merges in combination with being a bit more accomodating with getting the merges in place? Then we freeze the repositories over the weekend and do the conversions/stripping. After that, anyone wishing to merge code in will need to clone a fresh copy of the new git repository and patch manually. -- Anders _______________________________________________ Mailing list: https://launchpad.net/~fenics Post to : fenics@lists.launchpad.net Unsubscribe : https://launchpad.net/~fenics More help : https://help.launchpad.net/ListHelp