Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

Gregory Szorc Mon, 12 Dec 2016 21:14:47 -0800

On Mon, Dec 12, 2016 at 2:04 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:


>
>
> On 11/25/2016 04:28 AM, Gregory Szorc wrote:
>
>> # HG changeset patch
>> # User Gregory Szorc <gregory.sz...@gmail.com>
>> # Date 1480031508 28800
>> #      Thu Nov 24 15:51:48 2016 -0800
>> # Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb
>> # Parent  906a7d8e969552536fffe0df7a5e63bf5d79b34b
>> revlog: add clone method
>>
>> Upcoming patches will introduce functionality for in-place
>> repository/store "upgrades." Copying the contents of a revlog
>> feels sufficiently low-level to warrant being in the revlog
>> class. So this commit implements that functionality.
>>
>> Because full delta recomputation can be *very* expensive (we're
>> talking several hours on the Firefox repository), we support
>> multiple modes of execution with regards to delta (re)use. This
>> will allow repository upgrades to choose the "level" of
>> processing/optimization they wish to perform when converting
>> revlogs.
>>
>> It's not obvious from this commit, but "addrevisioncb" will be
>> used for progress reporting.
>>
>> diff --git a/mercurial/revlog.py b/mercurial/revlog.py
>> --- a/mercurial/revlog.py
>> +++ b/mercurial/revlog.py
>> @@ -1818,3 +1818,95 @@ class revlog(object):
>>          if not self._inline:
>>              res.append(self.datafile)
>>          return res
>> +
>> +    DELTAREUSEALWAYS = 'always'
>> +    DELTAREUSESAMEREVS = 'samerevs'
>> +    DELTAREUSENEVER = 'never'
>>
>
> We don't use cap in our names, even for constant, these should be lower
> case.
>

Line 50 of revlog.py disagrees with you and I dislike not using UPPERCASE
for constants in the standard. But I'll change these to lowercase.


>
> I personnaly find "delta reuse" a bit obscure and would probably go for
> "rediff" or something like that instead. I'm fine with keeping "delta
> reuse" is you are not convinced.
>

"diff" is kinda overloaded and this functionality is related to *delta*
chains, so I think "delta" in the name is more appropriate.


>
> On a broader topic, we should probably get a config option for that. This
> is the kind of behavior we might what to be able to control on the
> client/server level. We want a config section dedicated to these (format
> does not seems appropriate, and "server" (used for validate) is not really
> really accurate either)
>
> (Yes, this is outside of the scope of this patch but a good spot to
> mention it)
>

Yes, that could be useful. But it is out of scope.


>
> +    def clone(self, tr, destrevlog, addrevisioncb=None,
>> +              deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None):
>> +        """Copy this revlog to another, possibly with format changes.
>> +
>> +        The destination revlog will contain the same revisions and nodes.
>> +        However, it may not be bit-for-bit identical due to e.g. delta
>> encoding
>> +        differences.
>>
>
> I'm curious about why we pass a destrevlog here instead of creating a new
> one. Is this because the creation of new revlog is going to be deferred to
> a "cloned stored" or something? Then why picking old.clone(new) direction
> instead of new.fromold(old) one?
>

It was an arbitrary choice.


>
> +        The ``deltareuse`` argument control how deltas from the existing
>> revlog
>> +        are preserved in the destination revlog. The argument can have
>> the
>> +        following values:
>> +
>> +        DELTAREUSEALWAYS
>> +           Deltas will always be reused (if possible), even if the
>> destination
>> +           revlog would not select the same revisions for the delta.
>> This is the
>> +           fastest mode of operation.
>> +        DELTAREUSESAMEREVS
>> +           Deltas will be reused if the destination revlog would pick
>> the same
>> +           revisions for the delta. This mode strikes a balance between
>> speed
>> +           and optimization.
>> +        DELTAREUSENEVER
>> +           Deltas will never be reused. This is the slowest mode of
>> execution.
>>
>
> Maybe mention that we are improving the diff algorithm and therefor it is
> useful to rediff sometime ?
>
> +        Delta computation can be slow, so the choice of delta reuse
>> policy can
>> +        significantly affect run time.
>> +
>> +        The default policy (``DELTAREUSESAMEREVS``) strikes a balance
>> between
>> +        two extremes. Deltas will be reused if they are appropriate. But
>> if the
>> +        delta could choose a better revision, it will do so. This means
>> if you
>> +        are converting a non-generaldelta revlog to a generaldelta
>> revlog,
>> +        deltas will be recomputed if the delta's parent isn't a parent
>> of the
>> +        revision.
>> +
>> +        In addition to the delta policy, the ``aggressivemergedeltas``
>> argument
>> +        controls whether to compute deltas against both parents for
>> merges.
>> +        By default, the current default is used.
>>
>
> This is unrelated to the current patch, but "aggressivemergedeltas" is a
> very obscure name for "consider both parent for delta". In addition the
> "official" option seems to live in the "format" config section while this
> is not a format option. So if we could eventually clean that up at some
> point that would be great.
>

I agree the naming is not great. It is an undocumented option, so we should
be able to clean it up in the future.


>
> +        """
>> +        if deltareuse not in (self.DELTAREUSEALWAYS,
>> self.DELTAREUSESAMEREVS,
>> +                self.DELTAREUSENEVER):
>>
>
> We should get a set of valid flag on the class itself, that would make
> thing simpler:
>
>   if deltause not in self._validdeltastrategies
>
> +            raise ValueError(_('value for deltareuse invalid: %s') %
>> deltareuse)
>> +
>> +        if len(destrevlog):
>> +            raise ValueError(_('destination revlog is not empty'))
>> +
>> +        # lazydeltabase controls whether to reuse a cached delta, if
>> possible.
>> +        oldlazydeltabase = destrevlog._lazydeltabase
>> +        oldamd = destrevlog._aggressivemergedeltas
>> +
>> +        if deltareuse == self.DELTAREUSEALWAYS:
>> +            destrevlog._lazydeltabase = True
>> +        elif deltareuse == self.DELTAREUSESAMEREVS:
>> +            destrevlog._lazydeltabase = False
>>
>
> The assignment should happen within the 'try:', otherwise the old value
> might fail to be reinstalled.
>
> +
>> +        destrevlog._aggressivemergedeltas = aggressivemergedeltas or
>> oldamd
>>
>
> Same here, assignment should happen in the try.
>
> +        populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS,
>> +                                            self.DELTAREUSESAMEREVS)
>> +
>> +        try:
>> +            index = self.index
>> +            for rev in self:
>> +                entry = index[rev]
>> +
>> +                # Some classes override linkrev to take filtered revs
>> into
>> +                # account. Use raw entry from index.
>>
>
> I don't think cloning on filtered revlog make sense. We should enforce
> clone to run on unfiltered ones.
>

Good catch.


>
> +                linkrev = entry[4]
>> +                p1 = index[entry[5]][7]
>> +                p2 = index[entry[6]][7]
>> +                node = entry[7]
>> +                text = self.revision(rev)
>>
>
> IF we reuse delta, getting the revision here might lead to necessary
> processing. Am I missing something?


It will. Not a huge hit since it is just a patch for every revision since
the base revision should be cached, but a hit nonetheless. I'll fix this.
Be warned: this will require a little extra code to duplicate functionality
in addrevision().


>
>
> +                # (Possibly) reuse the delta from the revlog if allowed
>> and
>> +                # the revlog chunk is a delta.
>> +                cachedelta = None
>> +                if populatecachedelta:
>> +                    dp = self.deltaparent(rev)
>> +                    if dp != nullrev:
>> +                        cachedelta = (dp, str(self._chunk(rev)))
>> +
>> +                destrevlog.addrevision(text, tr, linkrev, p1, p2,
>> +                                       cachedelta=cachedelta, node=node)
>> +
>> +                if addrevisioncb:
>> +                    addrevisioncb(self, rev, node)
>> +        finally:
>> +            destrevlog._lazydeltabase = oldlazydeltabase
>> +            destrevlog._aggressivemergedeltas = oldamd
>>
>
>
> --
> Pierre-Yves David
>

_______________________________________________
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel

Re: [PATCH 1 of 7 upgraderepo V2] revlog: add clone method

Reply via email to