Re: mergetool: what to do about deleting precious files?

2017-05-30 Thread Junio C Hamano
"Philip Oakley"  writes:

> From: "Junio C Hamano" 
>
> Thanks for the replies. Let's see if I've got it...
>
>> "Philip Oakley"  writes:
>>
>>> If I now understand correctly, the merge process flow is:
>>>
>>> * canonicalise content (eol, smudge-clean, $id, renormalise, etc)
>>> * diff the content (internal, or GIT_EXTERNAL_DIFF)
>>> * apply the diff
>>> * if conflicts, only then use merge-driver/tool
>>>
>>> Would that be a correct interpretation?
>>
>> Not quite.  There are a lot more going on before any of those steps:
>>
>> * Find the common ancestor commit (which could be many).
>
> IIUC Git selects one of them, rather than all if there are many (which
> then may not be the optimum)

Not quite.  The interface to "git merge-$backend" can take more than
one and "git merge" frontend does pass them to the backend.  How
they are used depends on the backend.  The "resolve" one tries to
use all of them at once; the "recursive" one tries merge across them
to come up with a tree to be used as a single "virtual common
ancestor".  But details does not matter for the purpose of analysing
the case that triggered this discussion.

>>
>> * Walk the three trees (the common ancestor's, ours and theirs) in
>>   parallel, noticing what happened to each path.  Depending on what
>>   happened to the path in each branch, the merge may or may not
>>   "conflict" (e.g. when both sides added exactly the same contents
>>   to the same path, they are not counted as conflicting.  when we
>>   removed while they modified, they show as conflicting).
>
> I'm assuming here that this is the sha-oid comparison, and then
> checking the tree/blob names that match them. (the top tree not having
> a name). So here "conflict free" is that the sha-oids match.
>
> Also, I thnk this is saying that added or removed trees or blobs are
> in some sense are 'conflict free' (though still subject to rename/move
> detection etc). An added file/blob would be conflict free for merging
> into it's tree, yes?

After "recursive" figures out the renames, an addition that still
remains (i.e. not matched up with a deletion elsewhere) would be a
candidate to be added silently (except that D/F conflict can still
be diagnosed).

>> * For paths that are conflicting, feed the canonicalized content of
>>   the versions from common, ours and theirs to the file-level merge
>>   driver.
>
> So this is where any .gitattibutes settings come in, or is the merge
> driver after the diff step? (which could also be a user diff?)

I think you answered this yourself in your "Ok, I think I can see
how I was confused..." paragraph.


Re: mergetool: what to do about deleting precious files?

2017-05-30 Thread Philip Oakley

From: "Junio C Hamano" 

Thanks for the replies. Let's see if I've got it...


"Philip Oakley"  writes:


If I now understand correctly, the merge process flow is:

* canonicalise content (eol, smudge-clean, $id, renormalise, etc)
* diff the content (internal, or GIT_EXTERNAL_DIFF)
* apply the diff
* if conflicts, only then use merge-driver/tool

Would that be a correct interpretation?


Not quite.  There are a lot more going on before any of those steps:

* Find the common ancestor commit (which could be many).


IIUC Git selects one of them, rather than all if there are many (which then 
may not be the optimum)




* Walk the three trees (the common ancestor's, ours and theirs) in
  parallel, noticing what happened to each path.  Depending on what
  happened to the path in each branch, the merge may or may not
  "conflict" (e.g. when both sides added exactly the same contents
  to the same path, they are not counted as conflicting.  when we
  removed while they modified, they show as conflicting).


I'm assuming here that this is the sha-oid comparison, and then checking the 
tree/blob names that match them. (the top tree not having a name). So here 
"conflict free" is that the sha-oids match.


Also, I thnk this is saying that added or removed trees or blobs are in some 
sense are 'conflict free' (though still subject to rename/move detection 
etc). An added file/blob would be conflict free for merging into it's tree, 
yes?


IIUC, the comparison is therefore using the in-repo sha-oids; 
unless --renormalise was given which will do a smudge-clean washing cycle 
and recomute fresh canonical sha-oids for the comparison (rather than doing 
it later).




* For paths that are conflicting, feed the canonicalized content of
  the versions from common, ours and theirs to the file-level merge
  driver.


So this is where any .gitattibutes settings come in, or is the merge driver 
after the diff step? (which could also be a user diff?)



   The builtin file-level merge driver takes two xdiff (one
  between ancestor and ours, the other between ancestore and
  theirs) and reconciles them to produce the result.  But that is
  irrelevant in the context of "custom merge driver"; the builtin
  one is skipped altogether and the custom contents merge driver
  the user specified via the attributes is used instead.

Notice that the second step above has no customization knobs.  Any
path the second step deems not to conflict is "merged cleanly"
without even triggering the "oops, ours and theirs did conflicting
changes, to the content; let's see how the final content should look
like" (aka the third step).  This is *not* because "Git knows the
best"; it is merely that nobody felt the need for a mechanism to
allow customizing the second step.

And that is why I said you need a new customization mechanism if you
want to affect the outcome of the scenario that started this thread.


Ok, I think I can see how I was confused between the "tree merge" (oid 
conflict detection) and the more usual (to users) "file merge" (line by 
line, etc.). I wasn't sure where to find that as someone relatively new to 
Git.


Thanks for the explanations.
--
Philip 



---
This email has been checked for viruses by AVG.
http://www.avg.com



Re: mergetool: what to do about deleting precious files?

2017-05-29 Thread Junio C Hamano
"Philip Oakley"  writes:

> If I now understand correctly, the merge process flow is:
>
> * canonicalise content (eol, smudge-clean, $id, renormalise, etc)
> * diff the content (internal, or GIT_EXTERNAL_DIFF)
> * apply the diff
> * if conflicts, only then use merge-driver/tool
>
> Would that be a correct interpretation?

Not quite.  There are a lot more going on before any of those steps:

 * Find the common ancestor commit (which could be many).

 * Walk the three trees (the common ancestor's, ours and theirs) in
   parallel, noticing what happened to each path.  Depending on what
   happened to the path in each branch, the merge may or may not
   "conflict" (e.g. when both sides added exactly the same contents
   to the same path, they are not counted as conflicting.  when we
   removed while they modified, they show as conflicting).

 * For paths that are conflicting, feed the canonicalized content of
   the versions from common, ours and theirs to the file-level merge
   driver.  The builtin file-level merge driver takes two xdiff (one
   between ancestor and ours, the other between ancestore and
   theirs) and reconciles them to produce the result.  But that is
   irrelevant in the context of "custom merge driver"; the builtin
   one is skipped altogether and the custom contents merge driver
   the user specified via the attributes is used instead.

Notice that the second step above has no customization knobs.  Any
path the second step deems not to conflict is "merged cleanly"
without even triggering the "oops, ours and theirs did conflicting
changes, to the content; let's see how the final content should look
like" (aka the third step).  This is *not* because "Git knows the
best"; it is merely that nobody felt the need for a mechanism to
allow customizing the second step.

And that is why I said you need a new customization mechanism if you
want to affect the outcome of the scenario that started this thread.


Re: mergetool: what to do about deleting precious files?

2017-05-29 Thread Philip Oakley

From: "Junio C Hamano" 

"Philip Oakley"  writes:


So I do not think this is not limited to "new file".  Anything that
a tree-level three-way merge would resolve cleanly without having to
consult the content-level three-way merge will complete without
consulting the merge.ours.driver; per-file content-level three-way
merge driver (which is what merge= mechanism lets you
specify via the attributes mechanism) is not something you would
want to use for this kind of thing.  It is purely for resolving the
actual content-level conflicts.


That (that Git knows best) sounds just wrong.


Don't twist my words.  I never said Git knows best.


The part I was responding to was "would resolve cleanly without having to
consult the content-level three-way merge will complete without
consulting the merge.ours.driver".

It was that lack of consultation (by git) of the putative merge-driver that 
was being noted.


The general misunderstanding, as I now see it, is the (false) expectation 
that a merge-driver would do the whole merge process.


It took a bit of digging through the documentation for me to find out just 
what the merge process appears to be. I'm sure that it obvious to those who 
have worked with git from the beginning and the previous patch flow process, 
but the merge process wasn't obvious to me, and various blogs and SO Q on 
the issue suggest the same for many others.


If I now understand correctly, the merge process flow is:

* canonicalise content (eol, smudge-clean, $id, renormalise, etc)
* diff the content (internal, or GIT_EXTERNAL_DIFF)
* apply the diff
* if conflicts, only then use merge-driver/tool

Would that be a correct interpretation?





The user-level merge driver is a mechanism to affect conflict level
three-way merges.  The interface to the content level three-way
merge driver feeds three versions of blobs and the driver is
expected to give a merged result.  The interface as designed is
incapable of passing "here is the common ancestor", "our side is
missing" and "their side is this content".

So if we want a mechanism that can affect the outcome of tree-level
three-way merge, we need a _new_ mechanism.  The existing merge
drivers that are written by end users (at least the ones written
correctly to the spec, anyway) are not expecting to be called with
"in our tree, there is no blob here", and trying to piggyback on it
will break existing users.


Is an alternative to use the GIT_EXTERNAL_DIFF to create a nul diff, so no 
changes are applied (precious/sensitive file is left behind)? This would 
have no conflicts and no requirement for a merge-conflict driver.


--

Philip 



Re: mergetool: what to do about deleting precious files?

2017-05-28 Thread Junio C Hamano
"Philip Oakley"  writes:

>> So I do not think this is not limited to "new file".  Anything that
>> a tree-level three-way merge would resolve cleanly without having to
>> consult the content-level three-way merge will complete without
>> consulting the merge.ours.driver; per-file content-level three-way
>> merge driver (which is what merge= mechanism lets you
>> specify via the attributes mechanism) is not something you would
>> want to use for this kind of thing.  It is purely for resolving the
>> actual content-level conflicts.
>>
> That (that Git knows best) sounds just wrong.

Don't twist my words.  I never said Git knows best.  

The user-level merge driver is a mechanism to affect conflict level
three-way merges.  The interface to the content level three-way
merge driver feeds three versions of blobs and the driver is
expected to give a merged result.  The interface as designed is
incapable of passing "here is the common ancestor", "our side is
missing" and "their side is this content".

So if we want a mechanism that can affect the outcome of tree-level
three-way merge, we need a _new_ mechanism.  The existing merge
drivers that are written by end users (at least the ones written
correctly to the spec, anyway) are not expecting to be called with
"in our tree, there is no blob here", and trying to piggyback on it
will break existing users.


Re: mergetool: what to do about deleting precious files?

2017-05-28 Thread Philip Oakley

From: "Junio C Hamano" 

"Philip Oakley"  writes:

The git book [1] and a few blog posts [2] show how to preserve files 
which

are in the current branch against changes that are on the branch being
merged in.

e.g. (from [2])

echo ' merge=ours' >> .gitattributes && # commit
git config --global merge.ours.driver true

(test) $ git checkout demo
(demo) $ git merge -
#  contents are not merged and the original retained.



However what is not covered (at least in the documentation ) is the case
where the file to be ignored is not present on the current branch, but is
present on the branch to be merged in.


Hmph.  Per-path 'ours' and 'theirs' kick in only after we decide to
perform the content level three-way merge.  I wonder what would (not
"should", but "would with the current code") happen, with the same
attribute setting, if the file being merged were not changed by ours
but modified by the side branch?  I suspect that we'd take the change
made by the side branch.


Here the 'ours' strategy is defined by the user's config file merge driver 
list.


I'd understood it that once it was decided there was a merge to be performed 
(the repective blob oid's in the repo/index are different) that the problem 
of merging is then handed off to the declared merge driver.





Normal expectations would be that in such a case the new file from the
second parent branch would be added to the current branch.




The git-scm and blog posts suggest that the original is left in place at the 
%P path, the merge driver run, and its return values used to decide if the 
user has to go and resolve conflicts. By setting the driver to 'true', the 
result is then said to be that the current 'blob' (i.e. file) is accepted 
unchanged (in %P), so anything from the second parent blob was completely 
ignored.


However if we have the addition of a new file, I can't tell from the docs 
what should happen? Is this still a merge such that the merge driver is 
called, or is the added file accepted without recourse to its .gitattributes 
setting (surely that would be a bug).


Then assuming we have reached an external driver, and it wants to not add 
that very file that was added in the second parent branch, what does the %P 
path point to (/dev/null?) - in particular, shouldn't the docs say? (I've 
not tested, and one test is not proof)


It maybe that the user wants a merge driver that says "If I ever see a 
secret key or password, then remove the whole file", which (removing the 
file from the merge) is a currently undocumented process (if even possible).



So I do not think this is not limited to "new file".  Anything that
a tree-level three-way merge would resolve cleanly without having to
consult the content-level three-way merge will complete without
consulting the merge.ours.driver; per-file content-level three-way
merge driver (which is what merge= mechanism lets you
specify via the attributes mechanism) is not something you would
want to use for this kind of thing.  It is purely for resolving the
actual content-level conflicts.

That (that Git knows best) sounds just wrong. If the user has set a file 
attribute strategy, why would we ignore it? We already have different 
internal strategies anyway, so how do we even know that the potential merge 
was conflict free if we have haven't checked its attribute type. Maybe I'm 
missing something.

--
Philip 



Re: mergetool: what to do about deleting precious files?

2017-05-27 Thread Junio C Hamano
"Philip Oakley"  writes:

> The git book [1] and a few blog posts [2] show how to preserve files which
> are in the current branch against changes that are on the branch being
> merged in.
>
> e.g. (from [2])
>
> echo ' merge=ours' >> .gitattributes && # commit
> git config --global merge.ours.driver true
>
> (test) $ git checkout demo
> (demo) $ git merge -
> #  contents are not merged and the original retained.
>
>
>
> However what is not covered (at least in the documentation ) is the case
> where the file to be ignored is not present on the current branch, but is
> present on the branch to be merged in.

Hmph.  Per-path 'ours' and 'theirs' kick in only after we decide to
perform the content level three-way merge.  I wonder what would (not
"should", but "would with the current code") happen, with the same
attribute setting, if the file being merged were not changed by ours
but modified by the side branch?  I suspect that we'd take the change
made by the side branch.

> Normal expectations would be that in such a case the new file from the
> second parent branch would be added to the current branch.

So I do not think this is not limited to "new file".  Anything that
a tree-level three-way merge would resolve cleanly without having to
consult the content-level three-way merge will complete without
consulting the merge.ours.driver; per-file content-level three-way
merge driver (which is what merge= mechanism lets you
specify via the attributes mechanism) is not something you would
want to use for this kind of thing.  It is purely for resolving the
actual content-level conflicts.