Let's just make some JIRA issues. I'm not worried about volunteers for any
of it yet, just a direction we agree upon. Once we know where we are going,
we generally don't have a big volunteer problem. We haven't heard from Uwe
yet, but really does seem like moving to Git makes the most sense.

I'm certainly willing to spend some free time on this.

- Mark

On Tue, Dec 15, 2015 at 1:22 PM Dawid Weiss <dawid.we...@gmail.com> wrote:

>
> Oh, just for completeness -- moving to git is not just about the version
> management, it's also:
>
> 1) all the scripts that currently do validations, etc.
> 2) what to do with svn:* properties
> 3) what to do with empty folders (not available in git).
>
> I don't volunteer to solve these :)
>
> Dawid
>
>
> On Tue, Dec 15, 2015 at 7:09 PM, Dawid Weiss <dawid.we...@gmail.com>
> wrote:
>
>>
>> Ok, give me some time and I'll see what I can achieve. Now that I
>> actually wrote an SVN dump parser (validator and serializer) things are
>> under much better control...
>>
>> I'll try to achieve the following:
>>
>> 1) selectively drop unnecessary stuff from history (cms/, javadocs/, JARs
>> and perhaps other binaries),
>> 2) *preserve* history of all core sources. So svn log IndexWriter has to
>> go back all the way back to when Doug was young and pretty. Ooops, he's
>> still pretty of course.
>> 3) provide a way to link git history with svn revisions. I would,
>> ideally, include a "imported from svn:rev XXX" in the commit log message.
>> 4) annotate release tags and branches. I don't care much about interim
>> branches -- they are not important to me (please speak up if you think
>> otherwise).
>>
>> Dawid
>>
>> On Tue, Dec 15, 2015 at 7:03 PM, Robert Muir <rcm...@gmail.com> wrote:
>>
>>> If Dawid is volunteering to sort out this mess, +1 to let him make it
>>> a move to git. I don't care if we disagree about JARs, I trust he will
>>> do a good job and that is more important.
>>>
>>> On Tue, Dec 15, 2015 at 12:44 PM, Dawid Weiss <dawid.we...@gmail.com>
>>> wrote:
>>> >
>>> > It's not true that nobody is working on this. I have been working on
>>> the SVN
>>> > dump in the meantime. You would not believe how incredibly complex the
>>> > process of processing that (remote) dump is. Let me highlight a few key
>>> > issues:
>>> >
>>> > 1) There is no "one" Lucene SVN repository that can be transferred to
>>> git.
>>> > The history is a mess. Trunk, branches, tags -- all change paths at
>>> various
>>> > points in history. Entire projects are copied from *outside* the
>>> official
>>> > Lucene ASF path (when Solr, Nutch or Tika moved from the incubator, for
>>> > example).
>>> >
>>> > 2) The history of commits to Lucene's subpath of the SVN is ~50k
>>> commits.
>>> > ASF's commit history in which those 50k commits live is 1.8 *million*
>>> > commits. I think the git-svn sync crashes due to the sheer number of
>>> (empty)
>>> > commits in between actual changes.
>>> >
>>> > 3) There are a few commits that are gigantic. I mentioned Grant's 1.2G
>>> > patch, for example, but there are others (the second larger is
>>> 190megs, the
>>> > third is 136 megs).
>>> >
>>> > 4) The size of JARs is really not an issue. The entire SVN repo I
>>> mirrored
>>> > locally (including empty interim commits to cater for svn:mergeinfos)
>>> is 4G.
>>> > If you strip the stuff like javadocs and side projects (Nutch, Tika,
>>> Mahout)
>>> > then I bet the entire history can fit in 1G total. Of course stripping
>>> JARs
>>> > is also doable.
>>> >
>>> > 5) There is lots of junk at the main SVN path so you can't just
>>> version the
>>> > top-level folder. If you wanted to checkout /asf/lucene then the size
>>> of the
>>> > resulting folder is enormous -- I terminated the checkout after I
>>> reached
>>> > over 20 gigs. Well, technically you *could* do it, it'd preserve
>>> perfect
>>> > history, but I wouldn't want to git co a past version that checks out
>>> all
>>> > the tags, branches, etc. This has to be mapped in a sensible way.
>>> >
>>> > What I think is that all the above makes (straightforward) conversion
>>> to git
>>> > problematic. Especially moving paths are a problem -- how to mark tags/
>>> > branches, where the main line of development is, etc. This conversion
>>> would
>>> > have to be guided and hand-tuned to make sense. This effort would only
>>> pay
>>> > for itself if we move to git, otherwise I don't see the benefit. Paul's
>>> > script is fine for keeping short-term history.
>>> >
>>> > Dawid
>>> >
>>> > P.S. Either the SVN repo at Apache is broken or the SVN is broken,
>>> which
>>> > makes processing SVN history even more fun. This dump indicates Tika
>>> being
>>> > moved from the incubator to Lucene:
>>> >
>>> > svnrdump dump -r 712381 --incremental
>>> https://svn.apache.org/repos/asf/ >
>>> > out
>>> >
>>> > But when you dump just Lucene's subpath, the output is broken (last
>>> > changeset in the file is an invalid changeset, it carries no target):
>>> >
>>> > svnrdump dump -r 712381 --incremental
>>> > https://svn.apache.org/repos/asf/lucene > out
>>> >
>>> >
>>> >
>>> > On Tue, Dec 15, 2015 at 6:04 PM, Yonik Seeley <ysee...@gmail.com>
>>> wrote:
>>> >>
>>> >> If we move to git, stripping out jars seems to be an independent
>>> decision?
>>> >> Can you even strip out jars and preserve history (i.e. not change
>>> >> hashes and invalidate everyone's forks/clones)?
>>> >> I did run across this:
>>> >>
>>> >>
>>> http://stackoverflow.com/questions/17470780/is-it-possible-to-slim-a-git-repository-without-rewriting-history
>>> >>
>>> >> -Yonik
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> >> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> >>
>>> >
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>
>>
> --
- Mark
about.me/markrmiller

Reply via email to