On Fri, Apr 30, 2021 at 6:45 PM Joerg Sonnenberger <jo...@bec.de> wrote: > > Hello all, > this is a bit of a brain dump based on a recent discussion with > Pierre-Yves on IRC. I think it deserves some input from others and maybe > it can grow into a proper plan in time. > > For the context of this discussion, see > https://phab.mercurial-scm.org/D10082 > and > https://bz.mercurial-scm.org/show_bug.cgi?id=6219 > In short, the changelog entry itself is supposed to keep a list of files > touched by the change, but there are two major shortcomings: > (1) It doesn't show removals. > (2) Older versions notoriously mishandled merges. > > This is painful not just for the review mentioned above, but various > other places that want to efficiently track what files could have been > affected by a changeset. This is made even worse by the inability to do > get fast manifest deltas. The current manifest code again has two major > shortcomings: > (1) It only works against the delta parent and that isn't necessarily > one of the changeset parents. > (2) It doesn't show removals. > > Now I have been discussing "Bonsai" changesets as option for a while, > but that would certainly introduce a major breakage and is therefore > something that must be opt-in. But the idea by itself has enough merits > and brings the question on whether we can't compute the equivalent as > (non-optional) cache. What do we want in such a cache? We want to allow > walking from parent to child or the other way around and efficiently > compute the new file nodes. For merge commits, it should be possible to > traverse to either parent efficently. The first idea would be to store > lists of (path, old node, new node) for non-merges and (path, p1 node, > p2 node, new node) for merge commits. Added/removed files are using a > special marker node like <null> like in other places. This would be > require variable size entries (more parsing) and quite a bit of text > storage for big/deep repos. That's undesirable, so the first improvement > would be to have a side table of the path names similar to how we handle > it in the rev-branch-cache. The second option would be to store file > revisions and not nodes, but that has some concerns about strip handling. > It would reduce the size of the entries by factor of 4 to 7 though. Using > such a data structure for traversing is easy, just figure out which > parent is the desired parent and pick the appropiate old/new entries. > > Computing this structure is bit tricky as we do not want to ever compute > the full text of parent or child. So we need to process the diffs > directly. For that, we need to know the starting position of each file. > The end of each line we compute easily based on the node len and the > path name. We can exploit the special nature of the manifest: all lines > are sorted and path is unique. Based on that we can build an augmented > binary tree that keeps track of the size of its subtrees. The tree can > be shared with the delta base and updated bottom-up, e.g. as > red-black-tree that would require O(d log n) new nodes to be written > where d is the number of lines touched and n the total number of lines > in the manifest. For merges or if the delta is not against the parent of > a changeset, a bit more work is necessary to compute the precise diff. > This can be done either by walking the path between the nodes or by > recursively comparing the trees. I would expect that in most cases the > recursive compare works quite well as most tree nodes should end up > being shared for the typical short term forks and the like. > > Thoughts?
Sounds tricky, but possibly useful. A wrinkle: how would this interact with narrowing? Would the cache you describe just have to be rebuilt on a widen/narrow event? Assume that reifying the entire changelog isn't feasible for sufficiently large repositories on normal workstations... (Other than that, no thoughts to add past what PYD had) > > Joerg > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel