On Mon, Dec 12, 2016 at 2:04 AM, Pierre-Yves David < pierre-yves.da...@ens-lyon.org> wrote:
> > > On 11/25/2016 04:28 AM, Gregory Szorc wrote: > >> # HG changeset patch >> # User Gregory Szorc <gregory.sz...@gmail.com> >> # Date 1480031508 28800 >> # Thu Nov 24 15:51:48 2016 -0800 >> # Node ID c60995fce14b0e34bd1416dab3712a6c33bb29bb >> # Parent 906a7d8e969552536fffe0df7a5e63bf5d79b34b >> revlog: add clone method >> >> Upcoming patches will introduce functionality for in-place >> repository/store "upgrades." Copying the contents of a revlog >> feels sufficiently low-level to warrant being in the revlog >> class. So this commit implements that functionality. >> >> Because full delta recomputation can be *very* expensive (we're >> talking several hours on the Firefox repository), we support >> multiple modes of execution with regards to delta (re)use. This >> will allow repository upgrades to choose the "level" of >> processing/optimization they wish to perform when converting >> revlogs. >> >> It's not obvious from this commit, but "addrevisioncb" will be >> used for progress reporting. >> >> diff --git a/mercurial/revlog.py b/mercurial/revlog.py >> --- a/mercurial/revlog.py >> +++ b/mercurial/revlog.py >> @@ -1818,3 +1818,95 @@ class revlog(object): >> if not self._inline: >> res.append(self.datafile) >> return res >> + >> + DELTAREUSEALWAYS = 'always' >> + DELTAREUSESAMEREVS = 'samerevs' >> + DELTAREUSENEVER = 'never' >> > > We don't use cap in our names, even for constant, these should be lower > case. > Line 50 of revlog.py disagrees with you and I dislike not using UPPERCASE for constants in the standard. But I'll change these to lowercase. > > I personnaly find "delta reuse" a bit obscure and would probably go for > "rediff" or something like that instead. I'm fine with keeping "delta > reuse" is you are not convinced. > "diff" is kinda overloaded and this functionality is related to *delta* chains, so I think "delta" in the name is more appropriate. > > On a broader topic, we should probably get a config option for that. This > is the kind of behavior we might what to be able to control on the > client/server level. We want a config section dedicated to these (format > does not seems appropriate, and "server" (used for validate) is not really > really accurate either) > > (Yes, this is outside of the scope of this patch but a good spot to > mention it) > Yes, that could be useful. But it is out of scope. > > + def clone(self, tr, destrevlog, addrevisioncb=None, >> + deltareuse=DELTAREUSESAMEREVS, aggressivemergedeltas=None): >> + """Copy this revlog to another, possibly with format changes. >> + >> + The destination revlog will contain the same revisions and nodes. >> + However, it may not be bit-for-bit identical due to e.g. delta >> encoding >> + differences. >> > > I'm curious about why we pass a destrevlog here instead of creating a new > one. Is this because the creation of new revlog is going to be deferred to > a "cloned stored" or something? Then why picking old.clone(new) direction > instead of new.fromold(old) one? > It was an arbitrary choice. > > + The ``deltareuse`` argument control how deltas from the existing >> revlog >> + are preserved in the destination revlog. The argument can have >> the >> + following values: >> + >> + DELTAREUSEALWAYS >> + Deltas will always be reused (if possible), even if the >> destination >> + revlog would not select the same revisions for the delta. >> This is the >> + fastest mode of operation. >> + DELTAREUSESAMEREVS >> + Deltas will be reused if the destination revlog would pick >> the same >> + revisions for the delta. This mode strikes a balance between >> speed >> + and optimization. >> + DELTAREUSENEVER >> + Deltas will never be reused. This is the slowest mode of >> execution. >> > > Maybe mention that we are improving the diff algorithm and therefor it is > useful to rediff sometime ? > > + Delta computation can be slow, so the choice of delta reuse >> policy can >> + significantly affect run time. >> + >> + The default policy (``DELTAREUSESAMEREVS``) strikes a balance >> between >> + two extremes. Deltas will be reused if they are appropriate. But >> if the >> + delta could choose a better revision, it will do so. This means >> if you >> + are converting a non-generaldelta revlog to a generaldelta >> revlog, >> + deltas will be recomputed if the delta's parent isn't a parent >> of the >> + revision. >> + >> + In addition to the delta policy, the ``aggressivemergedeltas`` >> argument >> + controls whether to compute deltas against both parents for >> merges. >> + By default, the current default is used. >> > > This is unrelated to the current patch, but "aggressivemergedeltas" is a > very obscure name for "consider both parent for delta". In addition the > "official" option seems to live in the "format" config section while this > is not a format option. So if we could eventually clean that up at some > point that would be great. > I agree the naming is not great. It is an undocumented option, so we should be able to clean it up in the future. > > + """ >> + if deltareuse not in (self.DELTAREUSEALWAYS, >> self.DELTAREUSESAMEREVS, >> + self.DELTAREUSENEVER): >> > > We should get a set of valid flag on the class itself, that would make > thing simpler: > > if deltause not in self._validdeltastrategies > > + raise ValueError(_('value for deltareuse invalid: %s') % >> deltareuse) >> + >> + if len(destrevlog): >> + raise ValueError(_('destination revlog is not empty')) >> + >> + # lazydeltabase controls whether to reuse a cached delta, if >> possible. >> + oldlazydeltabase = destrevlog._lazydeltabase >> + oldamd = destrevlog._aggressivemergedeltas >> + >> + if deltareuse == self.DELTAREUSEALWAYS: >> + destrevlog._lazydeltabase = True >> + elif deltareuse == self.DELTAREUSESAMEREVS: >> + destrevlog._lazydeltabase = False >> > > The assignment should happen within the 'try:', otherwise the old value > might fail to be reinstalled. > > + >> + destrevlog._aggressivemergedeltas = aggressivemergedeltas or >> oldamd >> > > Same here, assignment should happen in the try. > > + populatecachedelta = deltareuse in (self.DELTAREUSEALWAYS, >> + self.DELTAREUSESAMEREVS) >> + >> + try: >> + index = self.index >> + for rev in self: >> + entry = index[rev] >> + >> + # Some classes override linkrev to take filtered revs >> into >> + # account. Use raw entry from index. >> > > I don't think cloning on filtered revlog make sense. We should enforce > clone to run on unfiltered ones. > Good catch. > > + linkrev = entry[4] >> + p1 = index[entry[5]][7] >> + p2 = index[entry[6]][7] >> + node = entry[7] >> + text = self.revision(rev) >> > > IF we reuse delta, getting the revision here might lead to necessary > processing. Am I missing something? It will. Not a huge hit since it is just a patch for every revision since the base revision should be cached, but a hit nonetheless. I'll fix this. Be warned: this will require a little extra code to duplicate functionality in addrevision(). > > > + # (Possibly) reuse the delta from the revlog if allowed >> and >> + # the revlog chunk is a delta. >> + cachedelta = None >> + if populatecachedelta: >> + dp = self.deltaparent(rev) >> + if dp != nullrev: >> + cachedelta = (dp, str(self._chunk(rev))) >> + >> + destrevlog.addrevision(text, tr, linkrev, p1, p2, >> + cachedelta=cachedelta, node=node) >> + >> + if addrevisioncb: >> + addrevisioncb(self, rev, node) >> + finally: >> + destrevlog._lazydeltabase = oldlazydeltabase >> + destrevlog._aggressivemergedeltas = oldamd >> > > > -- > Pierre-Yves David >
_______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel