This is a group reply because there is interesting bit in many different
emails, see the final link section below for details about the email I
refer to using [#].
Two different issues
=====================
Just to clarify: there is two different issues at play here:
*unintentional-cycle*: cycle of obsolescence markers is an unavoidable
possibility inherent to the fact we deal with a distributed system.
*intentional-cycle*: an idea from Jun to use "node-version" and
obsolescence cycle to revive obsolete changesets while keeping the same
hash. So far, locally creating cycle have been disallowed since dealing
with cycle is advanced and complex. Jun proposed to lift this in this
series [1].
Some background
===============
Jun and I had a short discussion Sunday late afternoon at the sprint
(referred in [2]). In that discussion, jun mentioned the idea of using
obsstore index to arbitrate cycle. At that point, I was thinking about
unintentional-cycle and I suggested digging in the direction of using
the "date" because it is a global property.
"unintentional-cycles" are expected to be "rare", dealing with them as
an error case that requires user intervention seems fine (as the rest of
instability handling). I dunno if using date will help presenting the
issue well to the end user, but that was at least a global properly so
any repo would see the same thing. Simplest handling I can think for
unintended cycle is (1) detect them (2) ask the user (3) create a
newnode (or maker but probably node) to record the solution (4) be done.
Main issue with current proposal: distribution
==============================================
There is multiple issues with the current proposal, but here the main
one that looks like a definitive blocker.
The node version proposal introduces a new concept of version for "node"
and "markers" and some logic to disables all markers with a version
lower than the node it affect. ("dates" of marker are used as node
version, but the same issue stand with generation maker).
The core issue here have been pointed by Gregory Szorc in [4]. Since our
system is distributed, it is impossible to make sure the date/generation
have the meaning they attempt to have.
Simple practical example with date:
-----------------------------------
[time-1] repo-B: changeset C-A is pulled from repo-A
[time-2] repo-A: changeset C-A is rewritten as C-B (time-2)
[time-3] repo-B: changeset C-C is marked as superceeded by C-A (time-3)
[time-4] repo-B: Changeset C-B (with marker) is pulled from repo-A
The markers pointing from C-B to C-A have a version "time-2" but "C-A"
now have a version of "time-3". "C-A" become wrongly un-obsoleted (and
C-B is in strange state). The fact C-A is un-obsoleted here is a bug and
do not match the meaning of the user action. This is a regression from
what evolution is currently able to achieve.
The fact patch 3 of this series had to update the evolve tests is a good
example of how the current approaches break some simple assumption in
current evolution behavior.
Simple practical example with generation number:
------------------------------------------------
Using generation number would make the simple example above works.
However they have the same class of issues in a sightly different way.
It is impossible to make sure the local generation number is in phase
with the other repositories. Here is another simple example using Jun
proposal to "revive" changesets:
repo-B: pull C-A from repo-A
repo-B: rewrite C-A as C-C (marker: A → B generation 0)
repo-B: undo the rewrite (marker: B → A generation 1)
repo-A: rewrite C-A as C-B (marker A → C generation 0)
repo-B: push C-A to repo-A (+ markers)
The markers pointing from C-B to C-A have a generation "0" but "C-A" now
have a version a generation of "1". "C-A" become wrongly un-obsoleted
(and C-B is in strange state). The fact C-A become un-obsoleted here is
a bug and do not match the meaning of the user action. This also a
regression from what evolution is currently able to achieves too.
Conclusion
----------
Approaches based on node and marker versions are unable to cope with the
distributed nature of Mercurial and evolution. One looking for the
hash-preservation-stone should go down more viable alley
Further Though
--------------
Something based on ordering markers themselves -might- still be
something leading to a solution (as still pointed by Greg in [4]).
However, by definition, markers creations might happens in parallel at
the same time in multiple repositories, and their is no defined order
between the two "branches". The complexity of tracking -and- using this
information seems high at first glances. Jun is talking about possible
dag building in [5]. The various property of obsmarkers and their
combinations with the changesets dag makes me doubt of that having a
viable complexity for such DAG, but I'm not seen anything concrete about
Jun idea yet so he might be ahead here.
Other issues with the current proposal
======================================
There is multiple facets of evolution broken by this proposal in its
current forms. I'm listing them as a guideline of things to pay
attention to when working on this problem.
Display of obsolescence history
-------------------------------
Since obsmarkers are plain filtered, they won't be visible to various UI
element using evolution to help the user (eg: revset, THG bits, pointing
to successors/precursors, etc).
This could probably be worked-around, but proposal on that front needs
to have proper testing and handling of this.
Exchange of markers
-------------------
The current filtering is also breaking the current exchange logic for
obsmarkers. Since obsmarker are filtered, they won't be properly be
detect and exchanges, leading to potential bug.
Again, we might be able work around this but this needs proper handling
and tests.
Detecting replaced heads during push
------------------------------------
The code checking for head addition in core can use the evolution
information to detect that the new heads replaces an old one. Using
"node" that are invisible to the discovery will throw this important
feature into disarray.
That needs testings and handling. Preserving hashes probably means we
need to update the wire-protocol to transfer more information.
New kind of troubles/unstability
--------------------------------
In the various example above, we end up with a rewrite (C-B) that is
visible at the same time as the version that is supersede (C-A). That is
an un-expected "unstable/troubled" state similar to when the same
situation appears because the predecessors is public.
Such situation will generated its own bench of issues similar to
divergence. It will have to be detected and dealt with. This adds a
significant complexity to the concept (and its implementation).
Detection and solution needs to be part of a proposal on that front.
About undoing action and reviving changeset with the same hash
==============================================================
Being able to "rewind/undo" rewrite action is an important part of the
user experience and potential strong force of evolution. This is
something we need to get done and this mentioned in the evolution
Roadmap[6].
Priority of hash preservation method
------------------------------------
However this is a distinct issue than "hash preservation". Sure being
able to rewind to a changeset with the same hash is a nice candy to
have. But rewinding to a different hash still does the job and can still
result in acceptable user experience.
Reusing the same hash is one of many way to approach the user experience
here. They are other option, multiple being simpler to achieve than hash
preservation. To empathize my point:
*** hash preservation is something nice, but it is seen as neither
*** critical nor necessary to make evolution move forward.
If Jun or other want to spend time searching solution for it, they
should. They are of course free of how they spend their time. But there
are nothing in the roadmap blocked on it.
As Sean mentioned [8], there is already many things to solve and not
than many people solving them. From over 5 years of user testing evolve,
I've received many feedback out the current quirk and trap of what we
have now. Hash changing is low in that list. A couple of user asked
about why the hash was changing and were satisfied with the reply. In
the mean time I've getting pingeg about monthly about the too limited
head-replacement-checking, rebase creating divergence, divergence
handling etc…
Example of alternative
----------------------
As I said, there is alternative available to provide a usable interface
to the user. The simplest thing we could start with is to have simple
"forwarding" from an old dead to its revived version. For this we just
needs:
* To record the "identity-replacement" in the obsmarkers in the flag
(we already have obsmarker flag).
* To detect user request to the old hash and forward it to the new hash,
probably with a small message. (we already have distinction between user
and internal node requests). (we should probably check the identity is
True at that step)
This is a small and basic solution, but this provide a simple example of
how we can make hash changes easier on the user.
In addition, avoiding large change/addition to our data model will help
moving faster on the other important issues.
Feedback on the process of this proposal
========================================
This proposal is a major change to the evolution concept. I find the way
it is brought up suboptimal in multiple points,
First, we are just out of a developer sprint. This could have been
scheduled as a formal one hour discussion slot to starts things out.
Second, This is really the kind of topic that deserve a PlanPage [7],
Listing goals, constrains and idea. That would provide information we
need to discuss this in a structured and easy to update way.
In addition, such major changes should come with a kill switch, probably
turned off by default, this would allow careful testing of the ideas
before going life for more people.
Moreover, there is mention of performances impact [7], but no numbers.
We need performance numbers for this kind of change.
Last, but not least, this kind of experiment should be introduced in the
evolution extension first. The evolve extensions is meant to be the
place were we experiment with new crazy idea. There are multiple reasons
for that
* Most of the advance/unsettled concept and commands still lives in the
extensions, and that is where they are tested. So checking how they
interact is better done in the extension.
* Evolve support multiple older version of Mercurial, this means it is
easier to deploy new ideas to current tester for feedback. The release
cycle is also more flexible than the mercurial-core one, allowing us to
ship new iteration and important bugfix faster.
* Usefulness and quality standard are lower since code in evolve will be
reviewed again before moving into core. So you can iterate faster.
Once we are happy enough about a concept and its implementation, it move
from the extensions to core.
Other
=====
*_source
--------
Jun is mentioning the '_source' extra in one of its commit message
([1]). It is worth underlining that the issue is a bit independent. That
'_source' attribute have been around since before evolution and have
user and UI interface unrelated to evolution. I don't see it going
anywhere in the future for backward compatibility reason.
(but sure, evolution benefit from the fact it exist)
inhibit
-------
This is a 3 years old hack that allow to "inhibit" the obsolescence on
some node. It was meant as a temporary hack at Facebook to benefit from
the lack of strip without exposing all users to the full evolution
experiment. The idea was to improve the pure evolve experience within
the next year and drop the hack. The hack do no play well with multiple
evolution concept (eg: troubles/instability detection, exchange). It was
never scheduled to be part of the core evolution experience. "support"
will eventually be dropped.
bad actor
---------
Bad actor are mentioned there [10]. If a bad actor have access to the
repo and obsolescence it probably have much simpler attack vector that
cycle. So I'm not too worried about that aspect.
Links:
======
[1]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/094593.html
[2]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095235.html
[3]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/094587.html
[4]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095593.html
[5]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095242.html
[6] https://www.mercurial-scm.org/wiki/CEDRoadMap
[7] https://www.mercurial-scm.org/wiki/WriteANewFeaturePlan
[8]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095334.html
[9]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095336.html
[10]
https://www.mercurial-scm.org/pipermail/mercurial-devel/2017-March/095405.html
On 03/13/2017 10:48 AM, Jun Wu wrote:
# HG changeset patch
# User Jun Wu <qu...@fb.com>
# Date 1489385975 25200
# Sun Mar 12 23:19:35 2017 -0700
# Node ID dec2b2328ef19c166f0ed1cb711b6c99dc9c590a
# Parent 8a17c541177f32348e248608b6a9dfd7fefdf517
# Available At https://bitbucket.org/quark-zju/hg-draft
# hg pull https://bitbucket.org/quark-zju/hg-draft -r dec2b2328ef1
obsolete: track node versions
This is the first step to support obs cycle nicely.
The core idea is straightforward,
A new marker "X -> Y" has the intention to make Y visible.
So if we know "X -> Y" is not older than another marker "Y -> ...", the
latter gets ignored. It's "not older than", not "newer", because then a
single changeset can be revived by "X -> X" (plus the "cancel out" property,
see below).
Then we just need to figure out how to sort the markers - i.e. defining
their "version"s. My very first idea is to use the offsets of markers stored
in obsstore, which is unambiguous. While marmoute suggested that the "date"
field makes more sense if markers are exchanged - the offsets are local
while the dates are global. And that's what this patch uses. If two markers
form a cycle and have a same date, they "cancel out" automatically, like
what darcs deals with conflicts - a reasonable behavior.
Note that sorting all markers naively by date works but is very slow. My
hg-committed has 11k markers. It takes nearly 1 second to sort them. Not to
mention sorting all markers will have difficulty with "mergemarkers" or
"addmarker", since they may require sorting all the markers again, which is
an unacceptable time complexity.
Therefore I came up with the dynamic filter idea. That's to make "node
versions" a separate state to track. And use it to filter "successors",
"precursors", etc. dynamically. Building nodeversions on my hg-committed
repo takes about 0.1 seconds, which looks pretty good. In theory it should
have similar time complexity to building "successors", which is acceptable.
diff --git a/mercurial/obsolete.py b/mercurial/obsolete.py
--- a/mercurial/obsolete.py
+++ b/mercurial/obsolete.py
@@ -499,4 +499,11 @@ def _addchildren(children, markers):
children.setdefault(p, set()).add(mark)
+@util.nogc
+def _bumpversions(nodeversions, markers):
+ for mark in markers:
+ date = mark[4][0]
+ for suc in mark[1]:
+ nodeversions[suc] = max(nodeversions.get(suc, -1), date)
+
def _checkinvalidmarkers(markers):
"""search for marker with invalid data and raise error if needed
@@ -535,4 +542,11 @@ class obsstore(object):
self._readonly = readonly
+ # the latest versions of possibly "revived" nodes. currently we use the
+ # creation date of markers as versions. so this is {node: date}
+ # A marker with successor=REVS will bump REVS to the version of the
+ # date of that marker.
+ # A marker with date <= nodeversions.get(precursor, -1) is invisible.
+ self._nodeversions = {}
+
def __iter__(self):
return iter(self._all)
@@ -643,4 +657,5 @@ class obsstore(object):
self._version, markers = _readmarkers(data)
markers = list(markers)
+ _bumpversions(self._nodeversions, markers)
_checkinvalidmarkers(markers)
return markers
@@ -676,4 +691,5 @@ class obsstore(object):
if self._cached('children'):
_addchildren(self.children, markers)
+ _bumpversions(self._nodeversions, markers)
_checkinvalidmarkers(markers)
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
--
Pierre-Yves David
_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel