[ 
https://issues.apache.org/jira/browse/LUCENE-6933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064208#comment-15064208
 ] 

Dawid Weiss commented on LUCENE-6933:
-------------------------------------

I pushed a test repo with merged history to:
https://github.com/dweiss/lucene-solr-svn2git

A few remarks.

* I left only branches {{branch_3x}}, {{branch_4x}} and {{branch_5x}} as active 
branches. {{trunk}} becomes {{master}}.
* The {{master}}'s history is not entirely up to date; we can fill in remaining 
commits by fast-forwarding the remaining commits manually if we switch to git.
* All the historical branches are tags under {{historical/branches/*}}, invoke 
{{git tag}} to see the list of tags.
* All releases are tagged in a consistent manner as 
{{releases/lucene,solr,lucene-solr/number}}. Previous "tags" from SVN are 
available under historical tags (see above).
* You can see "graft points" in history where Solr's, Lucene and Lucene-Solr's 
history is merged, see tags {{grafts/*}}.
* The size of .git repo with all JARs inside was 455mb. I truncated all the 
JARs to 0 bytes (but left their filenames in history), the size of git repo 
after this dropped to 214mb. There are still some large binary blobs (Kuromoji 
dictionaries, europarl, etc.). I'll see if I can reduce it even more, but this 
seems acceptable already.
* There are some oddball file permission issues on Windows.  Use {{git config 
core.filemode false}} to ignore.
* Checkout master and issue {{git log --follow 
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java}}.
* The blame history may *not* be identical due to differences in how git and 
svn handle merges, etc., but the history of each file should be fairly accurate.

> Create a (cleaned up) SVN history in git
> ----------------------------------------
>
>                 Key: LUCENE-6933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6933
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>         Attachments: multibranch-commits.log
>
>
> Goals:
> * selectively drop projects and core-irrelevant stuff:
>   ** {{lucene/site}}
>   ** {{lucene/nutch}}
>   ** {{lucene/lucy}}
>   ** {{lucene/tika}}
>   ** {{lucene/hadoop}}
>   ** {{lucene/mahout}}
>   ** {{lucene/pylucene}}
>   ** {{lucene/lucene.net}}
>   ** {{lucene/old_versioned_docs}}
>   ** {{lucene/openrelevance}}
>   ** {{lucene/board-reports}}
>   ** {{lucene/java/site}}
>   ** {{lucene/java/nightly}}
>   ** {{lucene/dev/nightly}}
>   ** {{lucene/dev/lucene2878}}
>   ** {{lucene/sandbox/luke}}
>   ** {{lucene/solr/nightly}}
> * preserve the history of all changes to core sources (Solr and Lucene).
>   ** {{lucene/java}}
>   ** {{lucene/solr}}
>   ** {{lucene/dev/trunk}}
>   ** {{lucene/dev/branches/branch_3x}}
>   ** {{lucene/dev/branches/branch_4x}}
>   ** {{lucene/dev/branches/branch_5x}}
> * provide a way to link git commits and history with svn revisions (amend the 
> log message).
> * annotate release tags
> * deal with large binary blobs (JARs): keep empty files instead for their 
> historical reference only.
> Non goals:
> * no need to preserve "exact" merge history from SVN (see "impossible" below).
> * Ability to build ancient versions is not an issue.
> Impossible:
> * It is not possible to preserve SVN "merge history" because of the following 
> reasons:
>   ** Each commit in SVN operates on individual files. So one commit can 
> "copy" (and record a merge) files from anywhere in the object tree, even 
> modifying them along the way. There simply is no equivalent for this in git. 
>   ** There are historical commits in SVN that apply changes to multiple 
> branches in one commit ({{r1569975}}) and merges *from* multiple branches in 
> one commit ({{r940806}}).
> * Because exact merge tracking is impossible then what follows is that exact 
> "linearized" history of a given file is also impossible to record. Let's say 
> changes X, Y and Z have been applied to a branch of a file A and then merged 
> back. In git, this would be reflected as a single commit flattening X, Y and 
> Z (on the target branch) and three independent commits on the branch. The 
> "copy-from" link from one branch to another cannot be represented because, as 
> mentioned, merges are done on entire branches in git, not on individual 
> files. Yes, there are commits in SVN history that have selective file merges 
> (not entire branches).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to