The issue I am trying to come to grips with in the current design, is
that the git repository of a number of interrelated projects will soon
become the logical OR of all blobs, commits, and trees in ALL the
projects. 

This will involve horrendous amounts of replication, as developers end
interchanging objects that originated from third parties who are not
party to the merge (and which happen to be in the repository because of
previous merge activity).

It would be really nice if the repositories for each project stay
distinct (and maybe even living on different servers). Merges should the
one of the few points of contact where the state be exchanged between
the repositories.

>> If you don't keep track of the incremental merges, you end up with
one 
>> really _difficult_ merge that may not be mergable at all. Not 
>> automatically, and perhaps not even with help.

No argument from me.... In my example, I didn't intend A2,A3,A4 to be
considered hash's of points where you did a merge - rather they are
hashes of individual points where you did a commit BETWEEN the points
where you merged. A1 and A5 are the "merge points"! 

It makes perfect sense to define some subset of commits as "merge-like"
commits, and only have those copied over from one repository to the
other. You could also only use merge-points in the common ancestor
calculation, and not worry about intermediate commits. 

Only small changes to the existing logic are necessary to do a merge by
"distributing" out the merge algorithm to each repository. This involves
querying each repository, and communicating the results, followed by
copying over only those blob objects necessary for the merge. 

After the merge, you would create a "merge-point commit" record that has
one of the parents pointing to a hash in the other repository!

But the BIG issue with this scheme, is that you will not be replicating
over any of the intermediate commits, trees, or blobs (not really needed
by the merge), but currently being traversed by various plumbing
components.

Hence my question....

-----Original Message-----
From: Linus Torvalds [mailto:[EMAIL PROTECTED] 
Sent: Friday, April 15, 2005 4:31 PM
To: Barry Silverman
Cc: git@vger.kernel.org
Subject: RE: Merge with git-pasky II.


[ I'm cc'ing the git list even though Barry's question wasn't cc'd. 
  Because I think his question is interesting and astute per se, even
  if I disagree with the proposal ]

On Fri, 15 Apr 2005, Barry Silverman wrote:
>
> If git is totally project based, and each commit represents total
state
> of the project, then how important is the intermediate commit
> information between two states. 

You need it in order to do further merges.

> IE, Area maintainer has A1->A2->A3->A4->A5 in a repository with 5
> commits, and 5 comments. And I last synced with A1.
> 
> A few days later I sync again. Couldn't I just pull the "diff-tree A5
> A1" and then commit to my tree just the record A1->A5. Why does MY
> repository need trees A2,A3,A4?

Because that second merge needs the first merge to work well. The first 
merge might have had some small differences that ended up auto-merging
(or 
even needing some manual help from you). The second time you sync, there

migth be some more automatic merging. And so on.

If you don't keep track of the incremental merges, you end up with one 
really _difficult_ merge that may not be mergable at all. Not 
automatically, and perhaps not even with help.

So in order to keep things mergable, you need to not diverge. And the 
intermediate merges are the "anchor-points" for the next merge, keeping 
the divergences minimal. 

I'm personally convinced that one of the reasons CVS is a pain to merge
is 
obviously that it doesn't do a good job of finding parents, but also 
exactly _because_ it makes merges so painful that people wait longer to
do 
them, so you never end up fixing the simple stuff. In contrast, if you
have all these small merges going on all the time, the hope is that 
there's never any really painful nasty "final merge".

So you're right - the small merges do end up cluttering up the revision 
history. But it's a small price to pay if it means that you avoid having

the painful ones.

> Isn't preserving the A1,A2,A3,A4,A5 a legacy of BK, which required all
> the changesets to be loaded in order, and so is a completely "file"
> driven concept? 

Nope. In fact, to some degree git will need this even _more_, since the
git merger is likely to be _weaker_ than BK, and thus more easily
confused.

I do believe that BK has these things for the same reason.

                        Linus


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to