RE: [PATCH 1 of 2 v9] graft: support grafting across move/copy (issue4028)

Gábor STEFANIK Mon, 03 Oct 2016 03:30:02 -0700

--------------------------------------------------------------------------
This message, including its attachments, is confidential. For more information 
please read NNG's email policy here:
http://www.nng.com/emailpolicy/
By responding to this email you accept the email policy.


-----Original Message-----
> From: Yuya Nishihara [mailto:you...@gmail.com] On Behalf Of Yuya
> Nishihara
> Sent: Sunday, October 2, 2016 6:02 PM
> To: Gábor STEFANIK <gabor.stefa...@nng.com>
> Cc: mercurial-devel@mercurial-scm.org
> Subject: Re: [PATCH 1 of 2 v9] graft: support grafting across move/copy
> (issue4028)
>
> On Fri, 16 Sep 2016 15:45:18 -0500, Gábor Stefanik wrote:
> > # HG changeset patch
> > # User Gábor Stefanik <gabor.stefa...@nng.com> # Date 1474058125 -7200
> > #      Fri Sep 16 22:35:25 2016 +0200
> > # Node ID 63a2ea1bc3aa51caaf22bdfe31e7ae685ac3894f
> > # Parent  285a8c3e53f2183438f0cdbc238e4ab851d0d110
> > graft: support grafting across move/copy (issue4028)
>
> > diff --git a/mercurial/copies.py b/mercurial/copies.py
> > --- a/mercurial/copies.py
> > +++ b/mercurial/copies.py
> > @@ -321,6 +321,23 @@
> >      if repo.ui.configbool('experimental', 'disablecopytrace'):
> >          return {}, {}, {}, {}
> >
> > +    # In certain scenarios (e.g. graft, update or rebase), ca can be
> overridden
> > +    # We still need to know a real common ancestor in this case
> > +    # We can't just compute _c1.ancestor(_c2) and compare it to ca,
> because
> > +    # there can be multiple common ancestors, e.g. in case of bidmerge.
> > +    # Because our caller may not know if the revision passed in lieu of the
> CA
> > +    # is a genuine common ancestor or not without explicitly checking it, 
> > it's
> > +    # better to determine that here.
> > +    cta = ca
> > +    # ca.descendant(wc) and ca.descendant(ca) are False, work around that
> > +    _c1 = c1.p1() if c1.rev() is None else c1
> > +    _c2 = c2.p1() if c2.rev() is None else c2
> > +    dirty_c1 = not (ca == _c1 or ca.descendant(_c1))
> > +    dirty_c2 = not (ca == _c2 or ca.descendant(_c2))
> > +    graft = dirty_c1 or dirty_c2
> > +    if graft:
> > +        cta = _c1.ancestor(_c2)
>
> Can you fix ctx.descendant() to handle wctx? or maybe this could be
>
>   dirtyc<n> = ca != ca.ancestor(c<n>)  # for n = 1, 2
>
> and one of ca.ancestor(c<n>) would be cta. (I don't carefully investigate 
> criss-
> cross merge case, so I might be wrong.)

ctx.ancestor() always returns just one common ancestor, not all possible 
candidates.

bidmerge uses a function called commonancestorsheads() instead (not sure where 
that
can be found), but a non-head common ancestor also shouldn't trigger graft 
logic.

So unfortunately there is no way to eliminate the use of descendant() here, nor 
do I feel
that there is a need to do so. descendant() is still faster than ancestor() or 
especially
commonancestorsheads().

>
> >      # find interesting file sets from manifests
> > +    if graft:
> > +        repo.ui.debug("  computing unmatched files in rotated DAG\n")
> >      addedinm1 = m1.filesnotin(ma)
> >      addedinm2 = m2.filesnotin(ma)
> > -    u1, u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
> > +    _u1, _u2 = _computenonoverlap(repo, c1, c2, addedinm1, addedinm2)
>
> Can you give them better names? '_' prefix in local variables generally means
> they are placeholder (unused output) variables. (Perhaps we can start with a
> trivial patch that separates u1/2 and _u1/2, which will describe why we need
> them.)

u1r, u2r, u1u, u2u (for rotated/unrotated)? I'm wary of using longer names, as 
many lines
in copies.py are already borderline for the 80-character rule, and will require 
reformatting
with the introduction of longer variable names.

>
> > +    # combine partial copy paths discovered in the previous step
> > +    copyfrom, copyto = incomplete1, incomplete2
> > +    if dirty_c1:
> > +        copyfrom, copyto = incomplete2, incomplete1
> > +    for f in copyfrom:
> > +        if f in copyto:
> > +            copy[copyto[f]] = copyfrom[f]
> > +            del copyto[f]
> > +    for f in incompletediverge:
> > +        assert f not in diverge
> > +        ic = incompletediverge[f]
> > +        if ic[0] in copyto:
> > +            diverge[f] = [copyto[ic[0]], ic[1]]
>
> Perhaps this can be a separate function, which would avoid spilling out
> temporary variables, copyfrom/to.

Sounds doable.

>
> > +    _incomplete = _incomplete2
> > +    if dirty_c1:
> > +        _incomplete = _incomplete1
> > +        assert _incomplete2 == {}
> > +    else:
> > +        assert _incomplete1 == {}
> > +    for f in _diverge:
> > +        assert f not in bothdiverge
> > +        ic = _diverge[f]
> > +        if ic[0] in _incomplete:
> > +            bothdiverge[f] = [_incomplete[ic[0]], ic[1]]
> > +        elif ic[0] in (m1 if dirty_c1 else m2):
> > +            # backed-out rename on one side, but watch out for deleted 
> > files
> > +            bothdiverge[f] = ic
>
> This looks very similar to the previous incomplete1/2 loop. Can we make
> them share the implementation?

Will do if it can be done without giving up the asserts.

>
> > +def checkcopies(ctx, f, m1, m2, ca, cta, remote_ca, limit, diverge, copy,
> > +                fullcopy, incomplete, incompletediverge):
>
> nit: s/remote_ca/remoteca/ to conform to the coding style.

OK.

>
> > @@ -460,14 +525,25 @@
> >      f = the filename to check
> >      m1 = the source manifest
> >      m2 = the destination manifest
> > -    ca = the changectx of the common ancestor
> > +    ca = the changectx of the common ancestor, overridden on graft
> > +    cta = topological common ancestor for graft-like scenarios
> > +    remote_ca = True if ca is outside cta::m1, False otherwise
>
> nit: you mean cta::ctx?

Right, didn't realize that ctx points to the same revision as m1.

>
> And it would be nice if there's a brief comment when to use ca/ma over
> cta/mta.
>
> > -    seen = set([f])
> > -    for oc in getfctx(f, m1[f]).ancestors():
> > +    seen = {f: [getfctx(f, m1[f])]}
> > +    for oc in seen[f][0].ancestors():
> >          ocr = oc.linkrev()
> >          of = oc.path()
> >          if of in seen:
> > +            seen[of].append(oc)
> >              # check limit late - grab last rename before
> >              if ocr < limit:
> >                  break
> >              continue
> > -        seen.add(of)
> > +        seen[of] = [oc]
>
> nit: maybe we only need a dict of file nodes, not a dict of file contexts?

how do I get a file node id from a file context? ancestors() gives us contexts.

>
> > -        fullcopy[f] = of # remember for dir rename detection
> > +        # remember for dir rename detection
> > +        if backwards:
> > +            fullcopy[of] = f # grafting backwards through renames
> > +        else:
> > +            fullcopy[f] = of
> >          if of not in m2:
> >              continue # no match, keep looking
> >          if m2[of] == ma.get(of):
> > -            break # no merge needed, quit early
> > +            return # no merge needed, quit early
>
> So we no longer set diverge[of] = [f] this case. I don't know if it was
> necessary to populate 'renamedelete', but this change seems good for a
> separate patch to make sure it never break anything.

This isn't just some optimization, it's necessary for correct behavior of the
new checkcopies. And it would break the old checkcopies if applied separately
before the main patch.

>
> > -        cr = _related(oc, c2, ca.rev())
> > +        cr = _related(oc, c2, cta.rev())
> >          if cr and (of == f or of == c2.path()): # non-divergent
> > -            copy[f] = of
> > -            of = None
> > -            break
> > +            if backwards:
> > +                copy[of] = f
> > +            elif of in ma:
> > +                copy[f] = of
> > +            elif remote_ca: # special case: a <- b <- a -> b "ping-pong" 
> > rename
> > +                copy[of] = f
> > +                del fullcopy[f]
> > +                fullcopy[of] = f
> > +            else: # divergence w.r.t. graft CA on one side of topological 
> > CA
> > +                for sf in seen:
> > +                    if sf in ma and getfctx(sf, ma[sf]) in seen[sf]:
> > +                        assert sf not in diverge
> > +                        diverge[sf] = [f, of]
> > +                        break
> > +            return
> >
> > -    if of in ma:
> > -        diverge.setdefault(of, []).append(f)
> > +    if of in mta:
> > +        if backwards or remote_ca:
> > +            incomplete[of] = f
> > +        else:
> > +            for sf in seen:
> > +                if sf in ma and getfctx(sf, ma[sf]) in seen[sf]:
> > +                    if cta == ca:
> > +                        diverge.setdefault(sf, []).append(f)
> > +                    else:
> > +                        incompletediverge[sf] = [of, f]
> > +                    return
>
> Can we split this to 2 or 3 patches? I guess divergence handling can be a
> separate patch, for example. The point is we'll probably want to speed up
> the review/feedback process. I think copy tracing is inherently complex.
> Reviewing a monolithic patch about it would require a fair amount of
> continuous time, which would make hard to start reviewing this series.

Is that really necessary? It's a lot of additional work for me, to find 
intermediate states that may work,
and fix each of them so they don't break anything in the testsuite, only for 
those fixes to be reverted
by the next patch in the series. It would be the digital equipment of digging 
and filling pits in the ground.

Splitting isn't the hard part; ensuring no breakage in the intermediate states 
is.

>
> (I haven't read the tests and some part of incomplete* handling yet. I'll
> revisit them when I found time to.)
>
> Regards,
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
RE: [PATCH 1 of 2 v9] graft: support grafting across move/copy (issue4028)

Reply via email to