Re: mergetool: what to do about deleting precious files?
"Philip Oakley"writes: > From: "Junio C Hamano" > > Thanks for the replies. Let's see if I've got it... > >> "Philip Oakley" writes: >> >>> If I now understand correctly, the merge process flow is: >>> >>> * canonicalise content (eol, smudge-clean, $id, renormalise, etc) >>> * diff the content (internal, or GIT_EXTERNAL_DIFF) >>> * apply the diff >>> * if conflicts, only then use merge-driver/tool >>> >>> Would that be a correct interpretation? >> >> Not quite. There are a lot more going on before any of those steps: >> >> * Find the common ancestor commit (which could be many). > > IIUC Git selects one of them, rather than all if there are many (which > then may not be the optimum) Not quite. The interface to "git merge-$backend" can take more than one and "git merge" frontend does pass them to the backend. How they are used depends on the backend. The "resolve" one tries to use all of them at once; the "recursive" one tries merge across them to come up with a tree to be used as a single "virtual common ancestor". But details does not matter for the purpose of analysing the case that triggered this discussion. >> >> * Walk the three trees (the common ancestor's, ours and theirs) in >> parallel, noticing what happened to each path. Depending on what >> happened to the path in each branch, the merge may or may not >> "conflict" (e.g. when both sides added exactly the same contents >> to the same path, they are not counted as conflicting. when we >> removed while they modified, they show as conflicting). > > I'm assuming here that this is the sha-oid comparison, and then > checking the tree/blob names that match them. (the top tree not having > a name). So here "conflict free" is that the sha-oids match. > > Also, I thnk this is saying that added or removed trees or blobs are > in some sense are 'conflict free' (though still subject to rename/move > detection etc). An added file/blob would be conflict free for merging > into it's tree, yes? After "recursive" figures out the renames, an addition that still remains (i.e. not matched up with a deletion elsewhere) would be a candidate to be added silently (except that D/F conflict can still be diagnosed). >> * For paths that are conflicting, feed the canonicalized content of >> the versions from common, ours and theirs to the file-level merge >> driver. > > So this is where any .gitattibutes settings come in, or is the merge > driver after the diff step? (which could also be a user diff?) I think you answered this yourself in your "Ok, I think I can see how I was confused..." paragraph.
Re: mergetool: what to do about deleting precious files?
From: "Junio C Hamano"Thanks for the replies. Let's see if I've got it... "Philip Oakley" writes: If I now understand correctly, the merge process flow is: * canonicalise content (eol, smudge-clean, $id, renormalise, etc) * diff the content (internal, or GIT_EXTERNAL_DIFF) * apply the diff * if conflicts, only then use merge-driver/tool Would that be a correct interpretation? Not quite. There are a lot more going on before any of those steps: * Find the common ancestor commit (which could be many). IIUC Git selects one of them, rather than all if there are many (which then may not be the optimum) * Walk the three trees (the common ancestor's, ours and theirs) in parallel, noticing what happened to each path. Depending on what happened to the path in each branch, the merge may or may not "conflict" (e.g. when both sides added exactly the same contents to the same path, they are not counted as conflicting. when we removed while they modified, they show as conflicting). I'm assuming here that this is the sha-oid comparison, and then checking the tree/blob names that match them. (the top tree not having a name). So here "conflict free" is that the sha-oids match. Also, I thnk this is saying that added or removed trees or blobs are in some sense are 'conflict free' (though still subject to rename/move detection etc). An added file/blob would be conflict free for merging into it's tree, yes? IIUC, the comparison is therefore using the in-repo sha-oids; unless --renormalise was given which will do a smudge-clean washing cycle and recomute fresh canonical sha-oids for the comparison (rather than doing it later). * For paths that are conflicting, feed the canonicalized content of the versions from common, ours and theirs to the file-level merge driver. So this is where any .gitattibutes settings come in, or is the merge driver after the diff step? (which could also be a user diff?) The builtin file-level merge driver takes two xdiff (one between ancestor and ours, the other between ancestore and theirs) and reconciles them to produce the result. But that is irrelevant in the context of "custom merge driver"; the builtin one is skipped altogether and the custom contents merge driver the user specified via the attributes is used instead. Notice that the second step above has no customization knobs. Any path the second step deems not to conflict is "merged cleanly" without even triggering the "oops, ours and theirs did conflicting changes, to the content; let's see how the final content should look like" (aka the third step). This is *not* because "Git knows the best"; it is merely that nobody felt the need for a mechanism to allow customizing the second step. And that is why I said you need a new customization mechanism if you want to affect the outcome of the scenario that started this thread. Ok, I think I can see how I was confused between the "tree merge" (oid conflict detection) and the more usual (to users) "file merge" (line by line, etc.). I wasn't sure where to find that as someone relatively new to Git. Thanks for the explanations. -- Philip --- This email has been checked for viruses by AVG. http://www.avg.com
Re: mergetool: what to do about deleting precious files?
"Philip Oakley"writes: > If I now understand correctly, the merge process flow is: > > * canonicalise content (eol, smudge-clean, $id, renormalise, etc) > * diff the content (internal, or GIT_EXTERNAL_DIFF) > * apply the diff > * if conflicts, only then use merge-driver/tool > > Would that be a correct interpretation? Not quite. There are a lot more going on before any of those steps: * Find the common ancestor commit (which could be many). * Walk the three trees (the common ancestor's, ours and theirs) in parallel, noticing what happened to each path. Depending on what happened to the path in each branch, the merge may or may not "conflict" (e.g. when both sides added exactly the same contents to the same path, they are not counted as conflicting. when we removed while they modified, they show as conflicting). * For paths that are conflicting, feed the canonicalized content of the versions from common, ours and theirs to the file-level merge driver. The builtin file-level merge driver takes two xdiff (one between ancestor and ours, the other between ancestore and theirs) and reconciles them to produce the result. But that is irrelevant in the context of "custom merge driver"; the builtin one is skipped altogether and the custom contents merge driver the user specified via the attributes is used instead. Notice that the second step above has no customization knobs. Any path the second step deems not to conflict is "merged cleanly" without even triggering the "oops, ours and theirs did conflicting changes, to the content; let's see how the final content should look like" (aka the third step). This is *not* because "Git knows the best"; it is merely that nobody felt the need for a mechanism to allow customizing the second step. And that is why I said you need a new customization mechanism if you want to affect the outcome of the scenario that started this thread.
Re: mergetool: what to do about deleting precious files?
From: "Junio C Hamano""Philip Oakley" writes: So I do not think this is not limited to "new file". Anything that a tree-level three-way merge would resolve cleanly without having to consult the content-level three-way merge will complete without consulting the merge.ours.driver; per-file content-level three-way merge driver (which is what merge= mechanism lets you specify via the attributes mechanism) is not something you would want to use for this kind of thing. It is purely for resolving the actual content-level conflicts. That (that Git knows best) sounds just wrong. Don't twist my words. I never said Git knows best. The part I was responding to was "would resolve cleanly without having to consult the content-level three-way merge will complete without consulting the merge.ours.driver". It was that lack of consultation (by git) of the putative merge-driver that was being noted. The general misunderstanding, as I now see it, is the (false) expectation that a merge-driver would do the whole merge process. It took a bit of digging through the documentation for me to find out just what the merge process appears to be. I'm sure that it obvious to those who have worked with git from the beginning and the previous patch flow process, but the merge process wasn't obvious to me, and various blogs and SO Q on the issue suggest the same for many others. If I now understand correctly, the merge process flow is: * canonicalise content (eol, smudge-clean, $id, renormalise, etc) * diff the content (internal, or GIT_EXTERNAL_DIFF) * apply the diff * if conflicts, only then use merge-driver/tool Would that be a correct interpretation? The user-level merge driver is a mechanism to affect conflict level three-way merges. The interface to the content level three-way merge driver feeds three versions of blobs and the driver is expected to give a merged result. The interface as designed is incapable of passing "here is the common ancestor", "our side is missing" and "their side is this content". So if we want a mechanism that can affect the outcome of tree-level three-way merge, we need a _new_ mechanism. The existing merge drivers that are written by end users (at least the ones written correctly to the spec, anyway) are not expecting to be called with "in our tree, there is no blob here", and trying to piggyback on it will break existing users. Is an alternative to use the GIT_EXTERNAL_DIFF to create a nul diff, so no changes are applied (precious/sensitive file is left behind)? This would have no conflicts and no requirement for a merge-conflict driver. -- Philip
Re: mergetool: what to do about deleting precious files?
"Philip Oakley"writes: >> So I do not think this is not limited to "new file". Anything that >> a tree-level three-way merge would resolve cleanly without having to >> consult the content-level three-way merge will complete without >> consulting the merge.ours.driver; per-file content-level three-way >> merge driver (which is what merge= mechanism lets you >> specify via the attributes mechanism) is not something you would >> want to use for this kind of thing. It is purely for resolving the >> actual content-level conflicts. >> > That (that Git knows best) sounds just wrong. Don't twist my words. I never said Git knows best. The user-level merge driver is a mechanism to affect conflict level three-way merges. The interface to the content level three-way merge driver feeds three versions of blobs and the driver is expected to give a merged result. The interface as designed is incapable of passing "here is the common ancestor", "our side is missing" and "their side is this content". So if we want a mechanism that can affect the outcome of tree-level three-way merge, we need a _new_ mechanism. The existing merge drivers that are written by end users (at least the ones written correctly to the spec, anyway) are not expecting to be called with "in our tree, there is no blob here", and trying to piggyback on it will break existing users.
Re: mergetool: what to do about deleting precious files?
From: "Junio C Hamano""Philip Oakley" writes: The git book [1] and a few blog posts [2] show how to preserve files which are in the current branch against changes that are on the branch being merged in. e.g. (from [2]) echo ' merge=ours' >> .gitattributes && # commit git config --global merge.ours.driver true (test) $ git checkout demo (demo) $ git merge - # contents are not merged and the original retained. However what is not covered (at least in the documentation ) is the case where the file to be ignored is not present on the current branch, but is present on the branch to be merged in. Hmph. Per-path 'ours' and 'theirs' kick in only after we decide to perform the content level three-way merge. I wonder what would (not "should", but "would with the current code") happen, with the same attribute setting, if the file being merged were not changed by ours but modified by the side branch? I suspect that we'd take the change made by the side branch. Here the 'ours' strategy is defined by the user's config file merge driver list. I'd understood it that once it was decided there was a merge to be performed (the repective blob oid's in the repo/index are different) that the problem of merging is then handed off to the declared merge driver. Normal expectations would be that in such a case the new file from the second parent branch would be added to the current branch. The git-scm and blog posts suggest that the original is left in place at the %P path, the merge driver run, and its return values used to decide if the user has to go and resolve conflicts. By setting the driver to 'true', the result is then said to be that the current 'blob' (i.e. file) is accepted unchanged (in %P), so anything from the second parent blob was completely ignored. However if we have the addition of a new file, I can't tell from the docs what should happen? Is this still a merge such that the merge driver is called, or is the added file accepted without recourse to its .gitattributes setting (surely that would be a bug). Then assuming we have reached an external driver, and it wants to not add that very file that was added in the second parent branch, what does the %P path point to (/dev/null?) - in particular, shouldn't the docs say? (I've not tested, and one test is not proof) It maybe that the user wants a merge driver that says "If I ever see a secret key or password, then remove the whole file", which (removing the file from the merge) is a currently undocumented process (if even possible). So I do not think this is not limited to "new file". Anything that a tree-level three-way merge would resolve cleanly without having to consult the content-level three-way merge will complete without consulting the merge.ours.driver; per-file content-level three-way merge driver (which is what merge= mechanism lets you specify via the attributes mechanism) is not something you would want to use for this kind of thing. It is purely for resolving the actual content-level conflicts. That (that Git knows best) sounds just wrong. If the user has set a file attribute strategy, why would we ignore it? We already have different internal strategies anyway, so how do we even know that the potential merge was conflict free if we have haven't checked its attribute type. Maybe I'm missing something. -- Philip
Re: mergetool: what to do about deleting precious files?
"Philip Oakley"writes: > The git book [1] and a few blog posts [2] show how to preserve files which > are in the current branch against changes that are on the branch being > merged in. > > e.g. (from [2]) > > echo ' merge=ours' >> .gitattributes && # commit > git config --global merge.ours.driver true > > (test) $ git checkout demo > (demo) $ git merge - > # contents are not merged and the original retained. > > > > However what is not covered (at least in the documentation ) is the case > where the file to be ignored is not present on the current branch, but is > present on the branch to be merged in. Hmph. Per-path 'ours' and 'theirs' kick in only after we decide to perform the content level three-way merge. I wonder what would (not "should", but "would with the current code") happen, with the same attribute setting, if the file being merged were not changed by ours but modified by the side branch? I suspect that we'd take the change made by the side branch. > Normal expectations would be that in such a case the new file from the > second parent branch would be added to the current branch. So I do not think this is not limited to "new file". Anything that a tree-level three-way merge would resolve cleanly without having to consult the content-level three-way merge will complete without consulting the merge.ours.driver; per-file content-level three-way merge driver (which is what merge= mechanism lets you specify via the attributes mechanism) is not something you would want to use for this kind of thing. It is purely for resolving the actual content-level conflicts.