Re: [PATCH 1 of 4 V2] obsolete: track node versions

Pierre-Yves David Mon, 27 Mar 2017 00:16:08 -0700

This is a group reply because there is interesting bit in many differentemails, see the final link section below for details about the email Irefer to using [#].


Two different issues
=====================


Just to clarify: there is two different issues at play here:

*unintentional-cycle*: cycle of obsolescence markers is an unavoidablepossibility inherent to the fact we deal with a distributed system.

*intentional-cycle*: an idea from Jun to use "node-version" andobsolescence cycle to revive obsolete changesets while keeping the samehash. So far, locally creating cycle have been disallowed since dealingwith cycle is advanced and complex. Jun proposed to lift this in thisseries [1].



Some background
===============

Jun and I had a short discussion Sunday late afternoon at the sprint(referred in [2]). In that discussion, jun mentioned the idea of usingobsstore index to arbitrate cycle. At that point, I was thinking aboutunintentional-cycle and I suggested digging in the direction of usingthe "date" because it is a global property.

"unintentional-cycles" are expected to be "rare", dealing with them asan error case that requires user intervention seems fine (as the rest ofinstability handling). I dunno if using date will help presenting theissue well to the end user, but that was at least a global properly soany repo would see the same thing. Simplest handling I can think forunintended cycle is (1) detect them (2) ask the user (3) create anewnode (or maker but probably node) to record the solution (4) be done.



Main issue with current proposal: distribution
==============================================

There is multiple issues with the current proposal, but here the mainone that looks like a definitive blocker.

The node version proposal introduces a new concept of version for "node"and "markers" and some logic to disables all markers with a versionlower than the node it affect. ("dates" of marker are used as nodeversion, but the same issue stand with generation maker).

The core issue here have been pointed by Gregory Szorc in [4]. Since oursystem is distributed, it is impossible to make sure the date/generationhave the meaning they attempt to have.


Simple practical example with date:
-----------------------------------

[time-1] repo-B: changeset C-A is pulled from repo-A
[time-2] repo-A: changeset C-A is rewritten as C-B (time-2)
[time-3] repo-B: changeset C-C is marked as superceeded by C-A (time-3)
[time-4] repo-B: Changeset C-B (with marker) is pulled from repo-A

The markers pointing from C-B to C-A have a version "time-2" but "C-A"now have a version of "time-3". "C-A" become wrongly un-obsoleted (andC-B is in strange state). The fact C-A is un-obsoleted here is a bug anddo not match the meaning of the user action. This is a regression fromwhat evolution is currently able to achieve.

The fact patch 3 of this series had to update the evolve tests is a goodexample of how the current approaches break some simple assumption incurrent evolution behavior.


Simple practical example with generation number:
------------------------------------------------

Using generation number would make the simple example above works.However they have the same class of issues in a sightly different way.It is impossible to make sure the local generation number is in phasewith the other repositories. Here is another simple example using Junproposal to "revive" changesets:


repo-B: pull C-A from repo-A
repo-B: rewrite C-A as C-C (marker: A → B generation 0)
repo-B: undo the rewrite   (marker: B → A generation 1)
repo-A: rewrite C-A as C-B (marker A → C generation 0)
repo-B: push C-A to repo-A (+ markers)

The markers pointing from C-B to C-A have a generation "0" but "C-A" nowhave a version a generation of "1". "C-A" become wrongly un-obsoleted(and C-B is in strange state). The fact C-A become un-obsoleted here isa bug and do not match the meaning of the user action. This also aregression from what evolution is currently able to achieves too.


Conclusion
----------

Approaches based on node and marker versions are unable to cope with thedistributed nature of Mercurial and evolution. One looking for thehash-preservation-stone should go down more viable alley


Further Though
--------------

Something based on ordering markers themselves -might- still besomething leading to a solution (as still pointed by Greg in [4]).

However, by definition, markers creations might happens in parallel atthe same time in multiple repositories, and their is no defined orderbetween the two "branches". The complexity of tracking -and- using thisinformation seems high at first glances. Jun is talking about possibledag building in [5]. The various property of obsmarkers and theircombinations with the changesets dag makes me doubt of that having aviable complexity for such DAG, but I'm not seen anything concrete aboutJun idea yet so he might be ahead here.



Other issues with the current proposal
======================================

There is multiple facets of evolution broken by this proposal in itscurrent forms. I'm listing them as a guideline of things to payattention to when working on this problem.


Display of obsolescence history
-------------------------------

Since obsmarkers are plain filtered, they won't be visible to various UIelement using evolution to help the user (eg: revset, THG bits, pointingto successors/precursors, etc).

This could probably be worked-around, but proposal on that front needsto have proper testing and handling of this.


Exchange of markers
-------------------

The current filtering is also breaking the current exchange logic forobsmarkers. Since obsmarker are filtered, they won't be properly bedetect and exchanges, leading to potential bug.

Again, we might be able work around this but this needs proper handlingand tests.


Detecting replaced heads during push
------------------------------------

The code checking for head addition in core can use the evolutioninformation to detect that the new heads replaces an old one. Using"node" that are invisible to the discovery will throw this importantfeature into disarray.

That needs testings and handling. Preserving hashes probably means weneed to update the wire-protocol to transfer more information.


New kind of troubles/unstability
--------------------------------

In the various example above, we end up with a rewrite (C-B) that isvisible at the same time as the version that is supersede (C-A). That isan un-expected "unstable/troubled" state similar to when the samesituation appears because the predecessors is public.

Such situation will generated its own bench of issues similar todivergence. It will have to be detected and dealt with. This adds asignificant complexity to the concept (and its implementation).


Detection and solution needs to be part of a proposal on that front.


About undoing action and reviving changeset with the same hash
==============================================================

Being able to "rewind/undo" rewrite action is an important part of theuser experience and potential strong force of evolution. This issomething we need to get done and this mentioned in the evolutionRoadmap[6].


Priority of hash preservation method
------------------------------------

However this is a distinct issue than "hash preservation". Sure beingable to rewind to a changeset with the same hash is a nice candy tohave. But rewinding to a different hash still does the job and can stillresult in acceptable user experience.

Reusing the same hash is one of many way to approach the user experiencehere. They are other option, multiple being simpler to achieve than hashpreservation. To empathize my point:


*** hash preservation is something nice, but it is seen as neither
*** critical nor necessary to make evolution move forward.

If Jun or other want to spend time searching solution for it, theyshould. They are of course free of how they spend their time. But thereare nothing in the roadmap blocked on it.

As Sean mentioned [8], there is already many things to solve and notthan many people solving them. From over 5 years of user testing evolve,I've received many feedback out the current quirk and trap of what wehave now. Hash changing is low in that list. A couple of user askedabout why the hash was changing and were satisfied with the reply. Inthe mean time I've getting pingeg about monthly about the too limitedhead-replacement-checking, rebase creating divergence, divergencehandling etc…


Example of alternative
----------------------

As I said, there is alternative available to provide a usable interfaceto the user. The simplest thing we could start with is to have simple"forwarding" from an old dead to its revived version. For this we justneeds:


* To record the "identity-replacement" in the obsmarkers in the flag
  (we already have obsmarker flag).

* To detect user request to the old hash and forward it to the new hash,probably with a small message. (we already have distinction between userand internal node requests). (we should probably check the identity isTrue at that step)

This is a small and basic solution, but this provide a simple example ofhow we can make hash changes easier on the user.

In addition, avoiding large change/addition to our data model will helpmoving faster on the other important issues.


Feedback on the process of this proposal
========================================

This proposal is a major change to the evolution concept. I find the wayit is brought up suboptimal in multiple points,

First, we are just out of a developer sprint. This could have beenscheduled as a formal one hour discussion slot to starts things out.

Second, This is really the kind of topic that deserve a PlanPage [7],Listing goals, constrains and idea. That would provide information weneed to discuss this in a structured and easy to update way.

In addition, such major changes should come with a kill switch, probablyturned off by default, this would allow careful testing of the ideasbefore going life for more people.

Moreover, there is mention of performances impact [7], but no numbers.We need performance numbers for this kind of change.

Last, but not least, this kind of experiment should be introduced in theevolution extension first. The evolve extensions is meant to be theplace were we experiment with new crazy idea. There are multiple reasonsfor that

* Most of the advance/unsettled concept and commands still lives in theextensions, and that is where they are tested. So checking how theyinteract is better done in the extension.

* Evolve support multiple older version of Mercurial, this means it iseasier to deploy new ideas to current tester for feedback. The releasecycle is also more flexible than the mercurial-core one, allowing us toship new iteration and important bugfix faster.

* Usefulness and quality standard are lower since code in evolve will bereviewed again before moving into core. So you can iterate faster.

Once we are happy enough about a concept and its implementation, it movefrom the extensions to core.


Other
=====

*_source
--------

Jun is mentioning the '_source' extra in one of its commit message([1]). It is worth underlining that the issue is a bit independent. That'_source' attribute have been around since before evolution and haveuser and UI interface unrelated to evolution. I don't see it goinganywhere in the future for backward compatibility reason.


(but sure, evolution benefit from the fact it exist)

inhibit
-------

This is a 3 years old hack that allow to "inhibit" the obsolescence onsome node. It was meant as a temporary hack at Facebook to benefit fromthe lack of strip without exposing all users to the full evolutionexperiment. The idea was to improve the pure evolve experience withinthe next year and drop the hack. The hack do no play well with multipleevolution concept (eg: troubles/instability detection, exchange). It wasnever scheduled to be part of the core evolution experience. "support"will eventually be dropped.


bad actor
---------

Bad actor are mentioned there [10]. If a bad actor have access to therepo and obsolescence it probably have much simpler attack vector thatcycle. So I'm not too worried about that aspect.


Links:
======

[1]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/094593.html[2]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095235.html[3]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/094587.html[4]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095593.html[5]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095242.html

[6] https://www.mercurial-scm.org/wiki/CEDRoadMap
[7] https://www.mercurial-scm.org/wiki/WriteANewFeaturePlan

[8]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095334.html[9]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095336.html[10]https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095405.html




On 03/13/2017 10:48 AM, Jun Wu wrote:

# HG changeset patch
# User Jun Wu <qu...@fb.com>
# Date 1489385975 25200
#      Sun Mar 12 23:19:35 2017 -0700
# Node ID dec2b2328ef19c166f0ed1cb711b6c99dc9c590a
# Parent  8a17c541177f32348e248608b6a9dfd7fefdf517
# Available At https://bitbucket.org/quark-zju/hg-draft
#              hg pull https://bitbucket.org/quark-zju/hg-draft -r dec2b2328ef1
obsolete: track node versions

This is the first step to support obs cycle nicely.

The core idea is straightforward,

  A new marker "X -> Y" has the intention to make Y visible.

So if we know "X -> Y" is not older than another marker "Y -> ...", the
latter gets ignored. It's "not older than", not "newer", because then a
single changeset can be revived by "X -> X" (plus the "cancel out" property,
see below).

Then we just need to figure out how to sort the markers - i.e. defining
their "version"s. My very first idea is to use the offsets of markers stored
in obsstore, which is unambiguous. While marmoute suggested that the "date"
field makes more sense if markers are exchanged - the offsets are local
while the dates are global. And that's what this patch uses. If two markers
form a cycle and have a same date, they "cancel out" automatically, like
what darcs deals with conflicts - a reasonable behavior.

Note that sorting all markers naively by date works but is very slow. My
hg-committed has 11k markers. It takes nearly 1 second to sort them. Not to
mention sorting all markers will have difficulty with "mergemarkers" or
"addmarker", since they may require sorting all the markers again, which is
an unacceptable time complexity.

Therefore I came up with the dynamic filter idea. That's to make "node
versions" a separate state to track. And use it to filter "successors",
"precursors", etc. dynamically. Building nodeversions on my hg-committed
repo takes about 0.1 seconds, which looks pretty good. In theory it should
have similar time complexity to building "successors", which is acceptable.

diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -499,4 +499,11 @@ def _addchildren(children, markers):
                 children.setdefault(p, set()).add(mark)

+@util.nogc
+def _bumpversions(nodeversions, markers):
+    for mark in markers:
+        date = mark[4][0]
+        for suc in mark[1]:
+            nodeversions[suc] = max(nodeversions.get(suc, -1), date)
+
 def _checkinvalidmarkers(markers):
     """search for marker with invalid data and raise error if needed
@@ -535,4 +542,11 @@ class obsstore(object):
         self._readonly = readonly

+        # the latest versions of possibly "revived" nodes. currently we use the
+        # creation date of markers as versions. so this is {node: date}
+        # A marker with successor=REVS will bump REVS to the version of the
+        # date of that marker.
+        # A marker with date <= nodeversions.get(precursor, -1) is invisible.
+        self._nodeversions = {}
+
     def __iter__(self):
         return iter(self._all)
@@ -643,4 +657,5 @@ class obsstore(object):
         self._version, markers = _readmarkers(data)
         markers = list(markers)
+        _bumpversions(self._nodeversions, markers)
         _checkinvalidmarkers(markers)
         return markers
@@ -676,4 +691,5 @@ class obsstore(object):
         if self._cached('children'):
             _addchildren(self.children, markers)
+        _bumpversions(self._nodeversions, markers)
         _checkinvalidmarkers(markers)

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH 1 of 4 V2] obsolete: track node versions

Reply via email to