I've long been perplexed by how long a buildout takes to run with
multiple parts whose required distributions are largely similar.  Taking
a stab at it, I found two hot spots that yield several fold improvements
in performance.

First, zc.buildout.easy_install._log_requirements was doing expensive
requirements parsing and sorting even when no message would be logged.
I committed a fix for it that on a 10 part buildout with a large "eggs"
option for each part decreased update time from a cProfile run time of
93 seconds to 15 seconds:

http://svn.zope.org/zc.buildout/trunk/src/zc/buildout/easy_install.py?rev=124059&r1=122980&r2=124059

Secondly, instantiating pkg_resources.Environment, including the
setuptools.package_index.PackageIndex subclass, is very expensive and
was being done multiple times for any given part, and was being done for
parts whose environments were identical.  There was some existing global
caching for package indexes that I've duplicated for environments in the
attached patch.

Unfortunately, I haven't been able to get a clean test environment for
the life of me.  I'm using a clean Python 2.7 build from source, turning
everything in ~/.buildout/default.cfg off, and running tests in a clean
checkout of the zc.buildout/trunk buildout.  Even under those conditions
I get 17 failing tests before any changes.  With this environments
cache, I see 41 failures, but I can't make sense of it.  This patch
yields another 2-3 fold decrease to 6 seconds for the same buildout and
is driven by profiling data, not guessing.  Can someone help me get this
patch in?

Finally, it would be great to see releases of zc.buildout with these
performance improvements get out in the world.  I've been hearing more
and more complaints about buildout run times and these are easy fixes.
If we can get the second, attached patch in quickly, then I'd say we
should release with both.  If not, then it's still worth it to cut a
release for the first, already committed patch, which yields the
greatest improvement.

Thanks!
Ross

Index: src/zc/buildout/easy_install.py
===================================================================
--- src/zc/buildout/easy_install.py	(revision 124061)
+++ src/zc/buildout/easy_install.py	(working copy)
@@ -231,8 +231,26 @@
     _indexes[key] = index
     return index
 
-clear_index_cache = _indexes.clear
 
+_envs = {}
+def _get_env(executable, path=None):
+    key = executable, tuple(path)
+    env = _envs.get(key)
+    if env is not None:
+        return env
+
+    env = pkg_resources.Environment(search_path=path,
+                                    python=_get_version(executable))
+
+    _envs[key] = env
+    return env
+
+
+def clear_index_cache():
+    _indexes.clear()
+    _envs.clear()
+
+
 if is_win32:
     # work around spawn lamosity on windows
     # XXX need safe quoting (see the subprocess.list2cmdline) and test
@@ -395,8 +413,7 @@
         if self._dest is None:
             newest = False
         self._newest = newest
-        self._env = pkg_resources.Environment(path,
-                                              python=_get_version(executable))
+        self._env = _get_env(executable, path)
         self._index = _get_index(executable, index, links, self._allow_hosts,
                                  self._path)
 
@@ -526,10 +543,7 @@
         return best_we_have, None
 
     def _load_dist(self, dist):
-        dists = pkg_resources.Environment(
-            dist.location,
-            python=_get_version(self._executable),
-            )[dist.project_name]
+        dists = _get_env(self._executable, dist.location,)[dist.project_name]
         assert len(dists) == 1
         return dists[0]
 
@@ -573,10 +587,7 @@
                     *args)
 
             dists = []
-            env = pkg_resources.Environment(
-                [tmp],
-                python=_get_version(self._executable),
-                )
+            env = _get_env(self._executable, [tmp])
             for project in env:
                 dists.extend(env[project])
 
@@ -619,12 +630,7 @@
                     else:
                         os.remove(newloc)
                 os.rename(d.location, newloc)
-
-                [d] = pkg_resources.Environment(
-                    [newloc],
-                    python=_get_version(self._executable),
-                    )[d.project_name]
-
+                [d] = _get_env(self._executable, [newloc])[d.project_name]
                 result.append(d)
 
             return result
@@ -780,10 +786,8 @@
                     # Getting the dist from the environment causes the
                     # distribution meta data to be read.  Cloning isn't
                     # good enough.
-                    dists = pkg_resources.Environment(
-                        [newloc],
-                        python=_get_version(self._executable),
-                        )[dist.project_name]
+                    dists = _get_env(self._executable, [newloc]
+                                     )[dist.project_name]
                 else:
                     # It's some other kind of dist.  We'll let easy_install
                     # deal with it:
@@ -910,7 +914,7 @@
         # Note that we don't use the existing environment, because we want
         # to look for new eggs unless what we have is the best that
         # matches the requirement.
-        env = pkg_resources.Environment(ws.entries)
+        env = _get_env(self._executable, ws.entries)
         while requirements:
             # Process dependencies breadth-first.
             req = self._constrain(requirements.pop(0))
_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
http://mail.python.org/mailman/listinfo/distutils-sig

Reply via email to