Oh, just for completeness -- moving to git is not just about the version management, it's also:
1) all the scripts that currently do validations, etc. 2) what to do with svn:* properties 3) what to do with empty folders (not available in git). I don't volunteer to solve these :) Dawid On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <dawid.we...@gmail.com> wrote: > > Ok, give me some time and I'll see what I can achieve. Now that I actually > wrote an SVN dump parser (validator and serializer) things are under much > better control... > > I'll try to achieve the following: > > 1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs > and perhaps other binaries), > 2) *preserve* history of all core sources. So svn log IndexWriter has to > go back all the way back to when Doug was young and pretty. Ooops, he's > still pretty of course. > 3) provide a way to link git history with svn revisions. I would, ideally, > include a "imported from svn:rev XXX" in the commit log message. > 4) annotate release tags and branches. I don't care much about interim > branches -- they are not important to me (please speak up if you think > otherwise). > > Dawid > > On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <rcm...@gmail.com> wrote: > >> If Dawid is volunteering to sort out this mess, +1 to let him make it >> a move to git. I don't care if we disagree about JARs, I trust he will >> do a good job and that is more important. >> >> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com> >> wrote: >> > >> > It's not true that nobody is working on this. I have been working on >> the SVN >> > dump in the meantime. You would not believe how incredibly complex the >> > process of processing that (remote) dump is. Let me highlight a few key >> > issues: >> > >> > 1) There is no "one" Lucene SVN repository that can be transferred to >> git. >> > The history is a mess. Trunk, branches, tags -- all change paths at >> various >> > points in history. Entire projects are copied from *outside* the >> official >> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for >> > example). >> > >> > 2) The history of commits to Lucene's subpath of the SVN is ~50k >> commits. >> > ASF's commit history in which those 50k commits live is 1.8 *million* >> > commits. I think the git-svn sync crashes due to the sheer number of >> (empty) >> > commits in between actual changes. >> > >> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G >> > patch, for example, but there are others (the second larger is 190megs, >> the >> > third is 136 megs). >> > >> > 4) The size of JARs is really not an issue. The entire SVN repo I >> mirrored >> > locally (including empty interim commits to cater for svn:mergeinfos) >> is 4G. >> > If you strip the stuff like javadocs and side projects (Nutch, Tika, >> Mahout) >> > then I bet the entire history can fit in 1G total. Of course stripping >> JARs >> > is also doable. >> > >> > 5) There is lots of junk at the main SVN path so you can't just version >> the >> > top-level folder. If you wanted to checkout /asf/lucene then the size >> of the >> > resulting folder is enormous -- I terminated the checkout after I >> reached >> > over 20 gigs. Well, technically you *could* do it, it'd preserve perfect >> > history, but I wouldn't want to git co a past version that checks out >> all >> > the tags, branches, etc. This has to be mapped in a sensible way. >> > >> > What I think is that all the above makes (straightforward) conversion >> to git >> > problematic. Especially moving paths are a problem -- how to mark tags/ >> > branches, where the main line of development is, etc. This conversion >> would >> > have to be guided and hand-tuned to make sense. This effort would only >> pay >> > for itself if we move to git, otherwise I don't see the benefit. Paul's >> > script is fine for keeping short-term history. >> > >> > Dawid >> > >> > P.S. Either the SVN repo at Apache is broken or the SVN is broken, which >> > makes processing SVN history even more fun. This dump indicates Tika >> being >> > moved from the incubator to Lucene: >> > >> > svnrdump dump -r 712381 --incremental https://svn.apache.org/repos/asf/ >> > >> > out >> > >> > But when you dump just Lucene's subpath, the output is broken (last >> > changeset in the file is an invalid changeset, it carries no target): >> > >> > svnrdump dump -r 712381 --incremental >> > https://svn.apache.org/repos/asf/lucene > out >> > >> > >> > >> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com> >> wrote: >> >> >> >> If we move to git, stripping out jars seems to be an independent >> decision? >> >> Can you even strip out jars and preserve history (i.e. not change >> >> hashes and invalidate everyone's forks/clones)? >> >> I did run across this: >> >> >> >> >> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history >> >> >> >> -Yonik >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >