Oh, just for completeness -- moving to git is not just about the version
management, it's also:

1) all the scripts that currently do validations, etc.
2) what to do with svn:* properties
3) what to do with empty folders (not available in git).

I don't volunteer to solve these :)

Dawid


On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <dawid.we...@gmail.com> wrote:

>
> Ok, give me some time and I'll see what I can achieve. Now that I actually
> wrote an SVN dump parser (validator and serializer) things are under much
> better control...
>
> I'll try to achieve the following:
>
> 1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs
> and perhaps other binaries),
> 2) *preserve* history of all core sources. So svn log IndexWriter has to
> go back all the way back to when Doug was young and pretty. Ooops, he's
> still pretty of course.
> 3) provide a way to link git history with svn revisions. I would, ideally,
> include a "imported from svn:rev XXX" in the commit log message.
> 4) annotate release tags and branches. I don't care much about interim
> branches -- they are not important to me (please speak up if you think
> otherwise).
>
> Dawid
>
> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <rcm...@gmail.com> wrote:
>
>> If Dawid is volunteering to sort out this mess, +1 to let him make it
>> a move to git. I don't care if we disagree about JARs, I trust he will
>> do a good job and that is more important.
>>
>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com>
>> wrote:
>> >
>> > It's not true that nobody is working on this. I have been working on
>> the SVN
>> > dump in the meantime. You would not believe how incredibly complex the
>> > process of processing that (remote) dump is. Let me highlight a few key
>> > issues:
>> >
>> > 1) There is no "one" Lucene SVN repository that can be transferred to
>> git.
>> > The history is a mess. Trunk, branches, tags -- all change paths at
>> various
>> > points in history. Entire projects are copied from *outside* the
>> official
>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for
>> > example).
>> >
>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k
>> commits.
>> > ASF's commit history in which those 50k commits live is 1.8 *million*
>> > commits. I think the git-svn sync crashes due to the sheer number of
>> (empty)
>> > commits in between actual changes.
>> >
>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G
>> > patch, for example, but there are others (the second larger is 190megs,
>> the
>> > third is 136 megs).
>> >
>> > 4) The size of JARs is really not an issue. The entire SVN repo I
>> mirrored
>> > locally (including empty interim commits to cater for svn:mergeinfos)
>> is 4G.
>> > If you strip the stuff like javadocs and side projects (Nutch, Tika,
>> Mahout)
>> > then I bet the entire history can fit in 1G total. Of course stripping
>> JARs
>> > is also doable.
>> >
>> > 5) There is lots of junk at the main SVN path so you can't just version
>> the
>> > top-level folder. If you wanted to checkout /asf/lucene then the size
>> of the
>> > resulting folder is enormous -- I terminated the checkout after I
>> reached
>> > over 20 gigs. Well, technically you *could* do it, it'd preserve perfect
>> > history, but I wouldn't want to git co a past version that checks out
>> all
>> > the tags, branches, etc. This has to be mapped in a sensible way.
>> >
>> > What I think is that all the above makes (straightforward) conversion
>> to git
>> > problematic. Especially moving paths are a problem -- how to mark tags/
>> > branches, where the main line of development is, etc. This conversion
>> would
>> > have to be guided and hand-tuned to make sense. This effort would only
>> pay
>> > for itself if we move to git, otherwise I don't see the benefit. Paul's
>> > script is fine for keeping short-term history.
>> >
>> > Dawid
>> >
>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken, which
>> > makes processing SVN history even more fun. This dump indicates Tika
>> being
>> > moved from the incubator to Lucene:
>> >
>> > svnrdump dump -r 712381 --incremental https://svn.apache.org/repos/asf/
>> >
>> > out
>> >
>> > But when you dump just Lucene's subpath, the output is broken (last
>> > changeset in the file is an invalid changeset, it carries no target):
>> >
>> > svnrdump dump -r 712381 --incremental
>> > https://svn.apache.org/repos/asf/lucene > out
>> >
>> >
>> >
>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com>
>> wrote:
>> >>
>> >> If we move to git, stripping out jars seems to be an independent
>> decision?
>> >> Can you even strip out jars and preserve history (i.e. not change
>> >> hashes and invalidate everyone's forks/clones)?
>> >> I did run across this:
>> >>
>> >>
>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history
>> >>
>> >> -Yonik
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

Reply via email to