On Thu, Nov 24, 2016 at 07:28:31PM -0800, Gregory Szorc wrote: > # HG changeset patch > # User Gregory Szorc <gregory.sz...@gmail.com> > # Date 1480035243 28800 > # Thu Nov 24 16:54:03 2016 -0800 > # Node ID 328d95ac1fd128359a2665f2959e030ee8cce046 > # Parent 4fffbdc2bca22e666482b55562c0e592c8a68027 > repair: determine what upgrade will do > > This commit is rather large. It started off as multiple commits > but I couldn't figure out a good way to split up the functionality > while still conveying the overall strategy. To the reviewer, I > apologize. > > This commit introducts a ton of functionality around determining > how a repository can be upgraded. > > The first code you'll see is related to repository requirements and > updates. There are no less than 5 functions returnings sets of > requirements that control upgrading. Why so many functions? > Mainly extensions. These would ideally be module variables. But > extensions will need to inject themselves into the upgrade process. > And if you are going to make 1 set extensible, you might as well > make them all extensible. > > Astute readers will see that we don't support "manifestv2" and > "treemanifest" requirements in the upgrade mechanism. I don't have > a great answer for why other than this is a complex set of patches > and I don't want to deal with the complexity of these experimental > features just yet. We can teach the upgrade mechanism about them > later, once the basic upgrade mechanism is in place. > > The "upgradefindimprovements" function introduces a mechanism to > return a list of improvements that can be made to a repository. > Each improvement is effectively an action that an upgrade will > perform. Associated with each of these improvements is metadata > that will be used to inform users what's wrong and what an > upgrade will do. > > Each "improvement" is categorized as a "deficiency" or an > "optimization." TBH, I'm not thrilled about the terminology and > am receptive to constructive bikeshedding. The main difference > between a "deficiency" and an "optimization" is a deficiency > is always corrected (if it deviates from the current config) and > an "optimization" is an optional action that goes above and beyond > to improve the state of the repository (usually by requiring more > CPU during upgrade). > > Our initial set of improvements identifies missing repository > requirements, a single, easily correctable problem with > changelog storage, and a set of "optimizations" related to delta > recalculation. > > The main "upgraderepo" function glues all of this code together. > It verifies that the repository is capable of being upgraded by > looking at the state of current and proposed requirements and > every difference therein. It then queries for the list of > improvements and determines which of them will run based on the > current repository state and user arguments. A non-trivial amount > of code is required to print all the improvements and what > actions will be taken by an upgrade. Some of this is factored into > inline functions to facilitate use in a subsequent commit. > > I went through numerous iterations of the output format before > settling on a ReST-inspired definition list format. (I used > bulleted lists in the first submission of this commit and could > not get it to format just right.) Even with the various iterations, > I'm still not super thrilled with the format. But, this is a debug* > command, so that should mean we can refine the output without BC > concerns. > > Rounding out the code is a series of new tests. > > diff --git a/mercurial/debugcommands.py b/mercurial/debugcommands.py > --- a/mercurial/debugcommands.py > +++ b/mercurial/debugcommands.py > @@ -34,6 +34,7 @@ from . import ( > localrepo, > lock as lockmod, > pycompat, > + repair, > revlog, > scmutil, > setdiscovery, > @@ -873,4 +874,4 @@ def debugupgraderepo(ui, repo, run=False > should complete almost instantaneously and the chances of a consumer > being > unable to access the repository should be low. > """ > - raise error.Abort(_('not yet implemented')) > + return repair.upgraderepo(ui, repo, run=run, optimize=optimize) > diff --git a/mercurial/repair.py b/mercurial/repair.py > --- a/mercurial/repair.py > +++ b/mercurial/repair.py > @@ -360,3 +360,417 @@ def deleteobsmarkers(obsstore, indices): > newobsstorefile.write(bytes) > newobsstorefile.close() > return n
[...] > +def upgradefindimprovements(repo): > + """Determine improvements that can be made to the repo during upgrade. > + > + Returns a list of dicts describing repository deficiencies and > + optimizations. Each dict value has the following keys: Can we use namedtuples instead for this? > + > + name > + Machine-readable string uniquely identifying this improvement. It > + will be mapped to an action later in the upgrade process. > + > + type > + Either ``deficiency`` or ``optimization``. A deficiency is an obvious > + problem. An ``optimization`` is an action (sometimes optional) that > + can be taken to further improve the state of the repository. Some kind of enum constant setup for this instead of magic strings please. I'd be very happy with TYPE_DEFICIENCY and TYPE_OPTIMIZATION for now. > > + description > + Message intended for humans explaining the improvement in more detail, > + including the implications of it. For ``deficiency`` types, should be > + worded in the present tense. For ``optimization`` types, should be > + worded in the future tense. > + > + upgrademessage > + Message intended for humans explaining what an upgrade addressing this > + issue will do. Should be worded in the future tense. > + > + fromdefault (``deficiency`` types only) > + Boolean indicating whether the current (deficient) state deviates > + from Mercurial's default configuration. > + > + fromconfig (``deficiency`` types only) > + Boolean indicating whether the current (deficient) state deviates > + from the current Mercurial configuration. > + """ [...] > + > +def upgradedetermineactions(repo, improvements, sourcereqs, destreqs, > + optimize): > + """Determine upgrade actions that will be performed. > + > + Given a list of improvements as returned by ``upgradefindimprovements``, > + determine the list of upgrade actions that will be performed. > + > + The role of this function is to filter improvements if needed, apply > + recommended optimizations from the improvements list that make sense, > + etc. > + > + Returns a list of action names. > + """ > + newactions = [] > + > + knownreqs = upgradesupporteddestrequirements(repo) > + > + for i in improvements: > + name = i['name'] > + > + # If the action is a requirement that doesn't show up in the > + # destination requirements, prune the action. > + if name in knownreqs and name not in destreqs: > + continue > + > + if i['type'] == 'deficiency': > + newactions.append(name) > + > + newactions.extend(o for o in sorted(optimize) if o not in newactions) > + > + # FUTURE consider adding some optimizations here for certain transitions. > + # e.g. adding generaldelta could schedule parent redeltas. Perhaps add a 'suggests' field to your namedtuple that has a list of other transitions that may as well be done at the same time. Good idea for later. > + > + return newactions > + > +def upgraderepo(ui, repo, run=False, optimize=None): > + """Upgrade a repository in place.""" > + # Avoid cycle: cmdutil -> repair -> localrepo -> cmdutil > + from . import localrepo > + > + optimize = set(optimize or []) > + > + repo = repo.unfiltered() > + > + # Ensure the repository can be upgraded. > + missingreqs = upgraderequiredsourcerequirements(repo) - repo.requirements > + if missingreqs: > + raise error.Abort(_('cannot upgrade repository; requirement ' > + 'missing: %s') % _(', > ').join(sorted(missingreqs))) > + > + blockedreqs = upgradeblocksourcerequirements(repo) & repo.requirements It weirds me out a little th have this be a blacklist, the more I get into this, but I can't put my finger on a specific case that would be problematic. Yet, anyway. > + if blockedreqs: > + raise error.Abort(_('cannot upgrade repository; unsupported source ' > + 'requirement: %s') % > + _(', ').join(sorted(blockedreqs))) > + > + # FUTURE there is potentially a need to control the wanted requirements > via > + # command arguments or via an extension hook point. > + newreqs = localrepo.newreporequirements(repo) > + > + noremovereqs = (repo.requirements - newreqs - > + upgradesupportremovedrequirements(repo)) > + if noremovereqs: > + raise error.Abort(_('cannot upgrade repository; requirement would be > ' > + 'removed: %s') % _(', > ').join(sorted(noremovereqs))) > + > + noaddreqs = (newreqs - repo.requirements - > + upgradeallowednewrequirements(repo)) > + if noaddreqs: > + raise error.Abort(_('cannot upgrade repository; do not support > adding ' > + 'requirement: %s') % > + _(', ').join(sorted(noaddreqs))) > + > + unsupportedreqs = newreqs - upgradesupporteddestrequirements(repo) > + if unsupportedreqs: > + raise error.Abort(_('cannot upgrade repository; do not support ' > + 'destination requirement: %s') % > + _(', ').join(sorted(unsupportedreqs))) > + > + # Find and validate all improvements that can be made. > + improvements = upgradefindimprovements(repo) > + for i in improvements: > + if i['type'] not in ('deficiency', 'optimization'): > + raise error.Abort(_('unexpected improvement type %s for %s') % ( > + i['type'], i['name'])) > + > + # Validate arguments. > + unknownoptimize = optimize - set(i['name'] for i in improvements > + if i['type'] == 'optimization') > + if unknownoptimize: > + raise error.Abort(_('unknown optimization action requested: %s') % > + ', '.join(sorted(unknownoptimize)), > + hint=_('run without arguments to see valid ' > + 'optimizations')) > + > + actions = upgradedetermineactions(repo, improvements, repo.requirements, > + newreqs, optimize) > + > + def printrequirements(): > + ui.write(_('requirements\n')) > + ui.write(_(' preserved: %s\n') % > + _(', ').join(sorted(newreqs & repo.requirements))) > + > + if repo.requirements - newreqs: > + ui.write(_(' removed: %s\n') % > + _(', ').join(sorted(repo.requirements - newreqs))) > + > + if newreqs - repo.requirements: > + ui.write(_(' added: %s\n') % > + _(', ').join(sorted(newreqs - repo.requirements))) > + > + ui.write('\n') > + > + def printupgradeactions(): > + for action in actions: > + for i in improvements: > + if i['name'] == action: > + ui.write('%s\n %s\n\n' % > + (i['name'], i['upgrademessage'])) > + > + if not run: > + fromdefault = [] > + fromconfig = [] > + optimizations = [] > + > + for i in improvements: > + assert i['type'] in ('deficiency', 'optimization') > + if i['type'] == 'deficiency': > + if i['fromdefault']: > + fromdefault.append(i) > + if i['fromconfig']: > + fromconfig.append(i) > + else: > + optimizations.append(i) > + > + if fromdefault or fromconfig: > + fromconfignames = set(x['name'] for x in fromconfig) > + onlydefault = [i for i in fromdefault > + if i['name'] not in fromconfignames] > + > + if fromconfig: > + ui.write(_('repository lacks features recommended by ' > + 'current config options:\n\n')) > + for i in fromconfig: > + ui.write('%s\n %s\n\n' % (i['name'], i['description'])) > + > + if onlydefault: > + ui.write(_('repository lacks features used by the default ' > + 'config options:\n\n')) > + for i in onlydefault: > + ui.write('%s\n %s\n\n' % (i['name'], i['description'])) > + > + ui.write('\n') > + else: > + ui.write(_('(no feature deficiencies found in existing ' > + 'repository)\n')) > + > + ui.write(_('performing an upgrade with "--run" will make the > following ' > + 'changes:\n\n')) > + > + printrequirements() > + printupgradeactions() > + > + unusedoptimize = [i for i in improvements > + if i['name'] not in actions and > + i['type'] == 'optimization'] > + if unusedoptimize: > + ui.write(_('additional optimizations are available by specifying > ' > + '"--optimize <name>":\n\n')) > + for i in unusedoptimize: > + ui.write(_('%s\n %s\n\n') % (i['name'], i['description'])) > + > + return > diff --git a/tests/test-upgrade-repo.t b/tests/test-upgrade-repo.t > --- a/tests/test-upgrade-repo.t > +++ b/tests/test-upgrade-repo.t > @@ -1,5 +1,181 @@ > + $ cat >> $HGRCPATH << EOF > + > [extensions] > + > share = > + > EOF > + > +store and revlogv1 are required in source > + > + $ hg --config format.usestore=false init no-store > + $ hg -R no-store debugupgraderepo > + abort: cannot upgrade repository; requirement missing: store > + [255] > + > + $ hg init no-revlogv1 > + $ cat > no-revlogv1/.hg/requires << EOF > + > dotencode > + > fncache > + > generaldelta > + > store > + > EOF > + > + $ hg -R no-revlogv1 debugupgraderepo > + abort: cannot upgrade repository; requirement missing: revlogv1 > + [255] > + > +Cannot upgrade shared repositories > + > + $ hg init share-parent > + $ hg -q share share-parent share-child > + > + $ hg -R share-child debugupgraderepo > + abort: cannot upgrade repository; unsupported source requirement: shared > + [255] > + > +Do not yet support upgrading manifestv2 and treemanifest repos > + > + $ hg --config experimental.manifestv2=true init manifestv2 > + $ hg -R manifestv2 debugupgraderepo > + abort: cannot upgrade repository; unsupported source requirement: > manifestv2 > + [255] > + > + $ hg --config experimental.treemanifest=true init treemanifest > + $ hg -R treemanifest debugupgraderepo > + abort: cannot upgrade repository; unsupported source requirement: > treemanifest > + [255] > + > +Cannot add manifestv2 or treemanifest requirement during upgrade > + > + $ hg init disallowaddedreq > + $ hg -R disallowaddedreq --config experimental.manifestv2=true --config > experimental.treemanifest=true debugupgraderepo > + abort: cannot upgrade repository; do not support adding requirement: > manifestv2, treemanifest > + [255] > + > +An upgrade of a repository created with recommended settings only suggests > optimizations > + > $ hg init empty > $ cd empty > $ hg debugupgraderepo > - abort: not yet implemented > - [255] > + (no feature deficiencies found in existing repository) > + performing an upgrade with "--run" will make the following changes: > + > + requirements > + preserved: dotencode, fncache, generaldelta, revlogv1, store > + > + additional optimizations are available by specifying "--optimize <name>": > + > + redeltaparent > + deltas within internal storage will be recalculated to choose an > optimal base revision where this was not already done; the size of the > repository may shrink and various operations may become faster; the first > time this optimization is performed could slow down upgrade execution > considerably; subsequent invocations should not run noticeably slower > + > + redeltamultibase > + deltas within internal storage will be recalculated against multiple > base revision and the smallest difference will be used; the size of the > repository may shrink significantly when there are many merges; this > optimization will slow down execution in proportion to the number of merges > in the repository and the amount of files in the repository; this slow down > should not be significant unless there are tens of thousands of files and > thousands of merges > + > + redeltaall > + deltas within internal storage will always be recalculated without > reusing prior deltas; this will likely make execution run several times > slower; this optimization is typically not needed > + > + > +--optimize can be used to add optimizations > + > + $ hg debugupgrade --optimize redeltaparent > + (no feature deficiencies found in existing repository) > + performing an upgrade with "--run" will make the following changes: > + > + requirements > + preserved: dotencode, fncache, generaldelta, revlogv1, store > + > + redeltaparent > + deltas within internal storage will choose a new base revision if needed > + > + additional optimizations are available by specifying "--optimize <name>": > + > + redeltamultibase > + deltas within internal storage will be recalculated against multiple > base revision and the smallest difference will be used; the size of the > repository may shrink significantly when there are many merges; this > optimization will slow down execution in proportion to the number of merges > in the repository and the amount of files in the repository; this slow down > should not be significant unless there are tens of thousands of files and > thousands of merges > + > + redeltaall > + deltas within internal storage will always be recalculated without > reusing prior deltas; this will likely make execution run several times > slower; this optimization is typically not needed > + > + > +Various sub-optimal detections work > + > + $ cat > .hg/requires << EOF > + > revlogv1 > + > store > + > EOF > + > + $ hg debugupgraderepo > + repository lacks features recommended by current config options: > + > + fncache > + long and reserved filenames may not work correctly; repository > performance is sub-optimal > + > + dotencode > + storage of filenames beginning with a period or space may not work > correctly > + > + generaldelta > + deltas within internal storage are unable to choose optimal revisions; > repository is larger and slower than it could be; interaction with other > repositories may require extra network and CPU resources, making "hg push" > and "hg pull" slower > + > + > + performing an upgrade with "--run" will make the following changes: > + > + requirements > + preserved: revlogv1, store > + added: dotencode, fncache, generaldelta > + > + fncache > + repository will be more resilient to storing certain paths and > performance of certain operations should be improved > + > + dotencode > + repository will be better able to store files beginning with a space or > period > + > + generaldelta > + repository storage will be able to create optimal deltas; new > repository data will be smaller and read times should decrease; interacting > with other repositories using this storage model should require less network > and CPU resources, making "hg push" and "hg pull" faster > + > + additional optimizations are available by specifying "--optimize <name>": > + > + redeltaparent > + deltas within internal storage will be recalculated to choose an > optimal base revision where this was not already done; the size of the > repository may shrink and various operations may become faster; the first > time this optimization is performed could slow down upgrade execution > considerably; subsequent invocations should not run noticeably slower > + > + redeltamultibase > + deltas within internal storage will be recalculated against multiple > base revision and the smallest difference will be used; the size of the > repository may shrink significantly when there are many merges; this > optimization will slow down execution in proportion to the number of merges > in the repository and the amount of files in the repository; this slow down > should not be significant unless there are tens of thousands of files and > thousands of merges > + > + redeltaall > + deltas within internal storage will always be recalculated without > reusing prior deltas; this will likely make execution run several times > slower; this optimization is typically not needed > + > + > + $ hg --config format.dotencode=false debugupgraderepo > + repository lacks features recommended by current config options: > + > + fncache > + long and reserved filenames may not work correctly; repository > performance is sub-optimal > + > + generaldelta > + deltas within internal storage are unable to choose optimal revisions; > repository is larger and slower than it could be; interaction with other > repositories may require extra network and CPU resources, making "hg push" > and "hg pull" slower > + > + repository lacks features used by the default config options: > + > + dotencode > + storage of filenames beginning with a period or space may not work > correctly > + > + > + performing an upgrade with "--run" will make the following changes: > + > + requirements > + preserved: revlogv1, store > + added: fncache, generaldelta > + > + fncache > + repository will be more resilient to storing certain paths and > performance of certain operations should be improved > + > + generaldelta > + repository storage will be able to create optimal deltas; new > repository data will be smaller and read times should decrease; interacting > with other repositories using this storage model should require less network > and CPU resources, making "hg push" and "hg pull" faster > + > + additional optimizations are available by specifying "--optimize <name>": > + > + redeltaparent > + deltas within internal storage will be recalculated to choose an > optimal base revision where this was not already done; the size of the > repository may shrink and various operations may become faster; the first > time this optimization is performed could slow down upgrade execution > considerably; subsequent invocations should not run noticeably slower > + > + redeltamultibase > + deltas within internal storage will be recalculated against multiple > base revision and the smallest difference will be used; the size of the > repository may shrink significantly when there are many merges; this > optimization will slow down execution in proportion to the number of merges > in the repository and the amount of files in the repository; this slow down > should not be significant unless there are tens of thousands of files and > thousands of merges > + > + redeltaall > + deltas within internal storage will always be recalculated without > reusing prior deltas; this will likely make execution run several times > slower; this optimization is typically not needed > + > _______________________________________________ > Mercurial-devel mailing list > Mercurial-devel@mercurial-scm.org > https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel