I've long been perplexed by how long a buildout takes to run with multiple parts whose required distributions are largely similar. Taking a stab at it, I found two hot spots that yield several fold improvements in performance.
First, zc.buildout.easy_install._log_requirements was doing expensive requirements parsing and sorting even when no message would be logged. I committed a fix for it that on a 10 part buildout with a large "eggs" option for each part decreased update time from a cProfile run time of 93 seconds to 15 seconds: http://svn.zope.org/zc.buildout/trunk/src/zc/buildout/easy_install.py?rev=124059&r1=122980&r2=124059 Secondly, instantiating pkg_resources.Environment, including the setuptools.package_index.PackageIndex subclass, is very expensive and was being done multiple times for any given part, and was being done for parts whose environments were identical. There was some existing global caching for package indexes that I've duplicated for environments in the attached patch. Unfortunately, I haven't been able to get a clean test environment for the life of me. I'm using a clean Python 2.7 build from source, turning everything in ~/.buildout/default.cfg off, and running tests in a clean checkout of the zc.buildout/trunk buildout. Even under those conditions I get 17 failing tests before any changes. With this environments cache, I see 41 failures, but I can't make sense of it. This patch yields another 2-3 fold decrease to 6 seconds for the same buildout and is driven by profiling data, not guessing. Can someone help me get this patch in? Finally, it would be great to see releases of zc.buildout with these performance improvements get out in the world. I've been hearing more and more complaints about buildout run times and these are easy fixes. If we can get the second, attached patch in quickly, then I'd say we should release with both. If not, then it's still worth it to cut a release for the first, already committed patch, which yields the greatest improvement. Thanks! Ross
Index: src/zc/buildout/easy_install.py =================================================================== --- src/zc/buildout/easy_install.py (revision 124061) +++ src/zc/buildout/easy_install.py (working copy) @@ -231,8 +231,26 @@ _indexes[key] = index return index -clear_index_cache = _indexes.clear +_envs = {} +def _get_env(executable, path=None): + key = executable, tuple(path) + env = _envs.get(key) + if env is not None: + return env + + env = pkg_resources.Environment(search_path=path, + python=_get_version(executable)) + + _envs[key] = env + return env + + +def clear_index_cache(): + _indexes.clear() + _envs.clear() + + if is_win32: # work around spawn lamosity on windows # XXX need safe quoting (see the subprocess.list2cmdline) and test @@ -395,8 +413,7 @@ if self._dest is None: newest = False self._newest = newest - self._env = pkg_resources.Environment(path, - python=_get_version(executable)) + self._env = _get_env(executable, path) self._index = _get_index(executable, index, links, self._allow_hosts, self._path) @@ -526,10 +543,7 @@ return best_we_have, None def _load_dist(self, dist): - dists = pkg_resources.Environment( - dist.location, - python=_get_version(self._executable), - )[dist.project_name] + dists = _get_env(self._executable, dist.location,)[dist.project_name] assert len(dists) == 1 return dists[0] @@ -573,10 +587,7 @@ *args) dists = [] - env = pkg_resources.Environment( - [tmp], - python=_get_version(self._executable), - ) + env = _get_env(self._executable, [tmp]) for project in env: dists.extend(env[project]) @@ -619,12 +630,7 @@ else: os.remove(newloc) os.rename(d.location, newloc) - - [d] = pkg_resources.Environment( - [newloc], - python=_get_version(self._executable), - )[d.project_name] - + [d] = _get_env(self._executable, [newloc])[d.project_name] result.append(d) return result @@ -780,10 +786,8 @@ # Getting the dist from the environment causes the # distribution meta data to be read. Cloning isn't # good enough. - dists = pkg_resources.Environment( - [newloc], - python=_get_version(self._executable), - )[dist.project_name] + dists = _get_env(self._executable, [newloc] + )[dist.project_name] else: # It's some other kind of dist. We'll let easy_install # deal with it: @@ -910,7 +914,7 @@ # Note that we don't use the existing environment, because we want # to look for new eggs unless what we have is the best that # matches the requirement. - env = pkg_resources.Environment(ws.entries) + env = _get_env(self._executable, ws.entries) while requirements: # Process dependencies breadth-first. req = self._constrain(requirements.pop(0))
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org http://mail.python.org/mailman/listinfo/distutils-sig