On Fri, Apr 30, 2021 at 6:45 PM Joerg Sonnenberger <jo...@bec.de> wrote:
>
> Hello all,
> this is a bit of a brain dump based on a recent discussion with
> Pierre-Yves on IRC. I think it deserves some input from others and maybe
> it can grow into a proper plan in time.
>
> For the context of this discussion, see
>   https://phab.mercurial-scm.org/D10082
> and
>   https://bz.mercurial-scm.org/show_bug.cgi?id=6219
> In short, the changelog entry itself is supposed to keep a list of files
> touched by the change, but there are two major shortcomings:
> (1) It doesn't show removals.
> (2) Older versions notoriously mishandled merges.
>
> This is painful not just for the review mentioned above, but various
> other places that want to efficiently track what files could have been
> affected by a changeset. This is made even worse by the inability to do
> get fast manifest deltas. The current manifest code again has two major
> shortcomings:
> (1) It only works against the delta parent and that isn't necessarily
> one of the changeset parents.
> (2) It doesn't show removals.
>
> Now I have been discussing "Bonsai" changesets as option for a while,
> but that would certainly introduce a major breakage and is therefore
> something that must be opt-in. But the idea by itself has enough merits
> and brings the question on whether we can't compute the equivalent as
> (non-optional) cache. What do we want in such a cache? We want to allow
> walking from parent to child or the other way around and efficiently
> compute the new file nodes. For merge commits, it should be possible to
> traverse to either parent efficently. The first idea would be to store
> lists of (path, old node, new node) for non-merges and (path, p1 node,
> p2 node, new node) for merge commits. Added/removed files are using a
> special marker node like <null> like in other places. This would be
> require variable size entries (more parsing) and quite a bit of text
> storage for big/deep repos. That's undesirable, so the first improvement
> would be to have a side table of the path names similar to how we handle
> it in the rev-branch-cache. The second option would be to store file
> revisions and not nodes, but that has some concerns about strip handling.
> It would reduce the size of the entries by factor of 4 to 7 though. Using
> such a data structure for traversing is easy, just figure out which
> parent is the desired parent and pick the appropiate old/new entries.
>
> Computing this structure is bit tricky as we do not want to ever compute
> the full text of parent or child. So we need to process the diffs
> directly. For that, we need to know the starting position of each file.
> The end of each line we compute easily based on the node len and the
> path name. We can exploit the special nature of the manifest: all lines
> are sorted and path is unique. Based on that we can build an augmented
> binary tree that keeps track of the size of its subtrees. The tree can
> be shared with the delta base and updated bottom-up, e.g. as
> red-black-tree that would require O(d log n) new nodes to be written
> where d is the number of lines touched and n the total number of lines
> in the manifest. For merges or if the delta is not against the parent of
> a changeset, a bit more work is necessary to compute the precise diff.
> This can be done either by walking the path between the nodes or by
> recursively comparing the trees. I would expect that in most cases the
> recursive compare works quite well as most tree nodes should end up
> being shared for the typical short term forks and the like.
>
> Thoughts?

Sounds tricky, but possibly useful. A wrinkle: how would this interact
with narrowing? Would the cache you describe just have to be rebuilt
on a widen/narrow event? Assume that reifying the entire changelog
isn't feasible for sufficiently large repositories on normal
workstations...

(Other than that, no thoughts to add past what PYD had)

>
> Joerg
> _______________________________________________
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Reply via email to