On 01/03/2017 12:57 AM, Gregory Szorc wrote:
# HG changeset patch # User Gregory Szorc <gregory.sz...@gmail.com> # Date 1483395249 28800 # Mon Jan 02 14:14:09 2017 -0800 # Node ID 96837453d041fe7289dd3e6450aee4961517c5b5 # Parent da38ab97f1f4d312966431a16d1f2acb90b75242 revlog: use compression engine APIs for decompression Now that compression engines declare their header in revlog chunks and can decompress revlog chunks, we refactor revlog.decompress() to use them. Making full use of the property that revlog compressor objects are reusable, revlog instances now maintain a dict mapping an engine's revlog header to a compressor object. This is not only a performance optimization for engines where compressor object reuse can result in better performance, but it also serves as a cache of header values so we don't need to perform redundant lookups against the compression engine manager. (Yes, I measured and the overhead of a function call versus a dict lookup was observed.) This change effectively replaces manual, inline lookup code with a dict lookup plus extra function call where compression is actually used. On the mozilla-unified repository: $ hg perfrevlogchunks -c ! chunk ! wall 1.955183 comb 1.960000 user 1.930000 sys 0.030000 (best of 6) ! wall 2.003241 comb 2.000000 user 1.960000 sys 0.040000 (best of 5) ! chunk batch ! wall 1.774444 comb 1.770000 user 1.750000 sys 0.020000 (best of 6) ! wall 1.866939 comb 1.850000 user 1.840000 sys 0.010000 (best of 6) $ hg perfrevlogchunks -m ! chunk ! wall 2.592354 comb 2.600000 user 2.560000 sys 0.040000 (best of 4) ! wall 2.733626 comb 2.730000 user 2.670000 sys 0.060000 (best of 4) ! chunk batch ! wall 2.469384 comb 2.480000 user 2.420000 sys 0.060000 (best of 4) ! wall 2.580613 comb 2.590000 user 2.550000 sys 0.040000 (best of 4) We can see this change has a negative impact on performance. There are 360,210 changelog entries (359,658 or 99.85% zlib compressed and 552 with the 'u' header). The mean time per decompress() in non-batch mode increased from 5.428us to 5.561us, a difference of 0.13us or 2.5%. There are 359,342 manifest entries (242,566 or 67.50% zlib compressed, 116,602 or 32.45% with '\0' header, and 174 or 0.05% that are empty or have the 'u' header). The mean time per decompress() in non-batch mode increased from 7.214us to 7.607us, a difference of 0.393us or 5.4%. If we reinsert inline code for "if t == 'x': return decompress(data)", performance remains unchanged or even becomes faster than before (if we avoid the checks for '\0' and 'u' before 'x'). My conclusion is that attribute lookups and function calls in Python can matter for performance. We've known this to be true, which is why tight loops leverage local variables to avoid attribute lookups. But I'll be honest: I was surprised to see this come into play for decompression: I would have expected time spent in the decompressor to dwarf Python time. Anyway, `hg perfrevlogchunks` is a microbenchmark. It's a good tool to measure raw performance, but it may not be indicative of real world behavior. For example, the performance loss on manifests as a percentage is troubling. But very few operations need to decompress every manifest chunk. And when they do, they typically have to perform other CPU intensive behavior like applying the diff. So the performance loss on manifests doesn't concern me as much. What about changelogs? Again, there are only so many operations that need to read every changelog revision. Fortunately, we have other perf commands to measure the impact on these operations: $ hg perfrevlogrevision -c -d 1 ! wall 4.624776 comb 4.620000 user 4.600000 sys 0.020000 (best of 3) ! wall 4.711967 comb 4.710000 user 4.700000 sys 0.010000 (best of 3) $ hg perfrevset 'author(mpm)' ! wall 9.611942 comb 9.600000 user 9.570000 sys 0.030000 (best of 3) ! wall 9.915843 comb 9.900000 user 9.860000 sys 0.040000 (best of 3) $ hg perfrevset 'author(mpm)' --contexts ! wall 9.930018 comb 9.930000 user 9.900000 sys 0.030000 (best of 3) ! wall 10.007671 comb 10.000000 user 9.970000 sys 0.030000 (best of 3) Again, we see performance regressions. I feel bad about introducing a performance regression. But, I can argue that's the price to pay for an abstraction. I suppose we could restore the inline code for zlib (and presumably zstd someday) so we don't go through the compression engine API. We could have "tier 1" compression formats where we support fast/inline operations while still allowing extensions to implement their own, fully capable compression engines.
The regression percentage are noticeable enough (I've seen patch for less) that I think hardcoding the known common case make sense to avoid the regression. That is not super nice but that provide a good way to move forward (in that great direction) without
diff --git a/mercurial/revlog.py b/mercurial/revlog.py --- a/mercurial/revlog.py +++ b/mercurial/revlog.py @@ -18,7 +18,6 @@ import errno import hashlib import os import struct -import zlib # import stuff from node for others to import from revlog from .node import ( @@ -39,7 +38,6 @@ from . import ( _pack = struct.pack _unpack = struct.unpack -_decompress = zlib.decompress # revlog header flags REVLOGV0 = 0 @@ -298,6 +296,8 @@ class revlog(object): self._chunkclear() # revnum -> (chain-length, sum-delta-length) self._chaininfocache = {} + # revlog header -> revlog compressor + self._decompressors = {} @util.propertycache def _compressor(self): @@ -1374,17 +1374,43 @@ class revlog(object): """ if not data: return data + + # Revlogs are read much more frequently than they are written, so + # performance of this function is important. + # + # We can make a few assumptions about revlogs: + # + # 1) the majority of chunks will be compressed (as opposed to inline + # raw data). + # 2) decompressing *any* data will likely by at least 10x slower than + # returning raw inline data. + # + # It follows that we want to optimize the "decompress compressed data" + # case over "return inline data." The code below does this by putting + # the self._decompressors lookup first and *then* checks for our + # special cases of the ``\0`` and ``u`` headers. + # + # For cases where data is actually compressed, this avoids comparison + # against the "null" decompressors and allows us to dispatch to the + # decompressor quicker. According to `hg perfrevlogchunks`, this is + # ~0.5% faster. t = data[0] - if t == '\0': - return data - if t == 'x': + try: + compressor = self._decompressors[t] + except KeyError: + if t == '\0': + return data + elif t == 'u': + return util.buffer(data, 1) + try: - return _decompress(data) - except zlib.error as e: - raise RevlogError(_('revlog decompress error: %s') % str(e)) - if t == 'u': - return util.buffer(data, 1) - raise RevlogError(_('unknown compression type %r') % t) + engine = util.compengines.forrevlogheader(t) + compressor = engine.revlogcompressor() + self._decompressors[t] = compressor + except KeyError: + raise RevlogError(_('unknown compression type %r') % t) + + return compressor.decompress(data) def _isgooddelta(self, d, textlen): """Returns True if the given delta is good. Good means that it is within _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
-- Pierre-Yves David _______________________________________________ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel