At 10:47 AM 6/28/2007 -0500, Dave Peterson wrote: >Has anyone done any investigation into the performance implications of >having large numbers of eggs installed? Is there any sort of >performance hit? > > >It seems to me that having a really large path might slow down imports a >bit, though I suspect this is in C code so probably not a significant >problem.
If the eggs are zipped, the performance overhead for imports is negligible, although there is a small startup cost to read the zipfile indexes. Python caches zipfile indexes in memory, so checking whether a module is present is just a dictionary lookup and is much faster than having a directory on sys.path. If the eggs are *not* zipped, however, the performance impact on individual imports is much higher. That's why easy_install installs eggs zipped by default. > It also seems like there might be some startup penalties due >to the overhead of setting up the path when using eggs, but this is a >one-time cost during python startup, so probably not too bad either. > >I'm asking because we're in the process to switching our open-source >Enthought Tool Suite library to a distribution of components via eggs >and we're having some internal debate as to whether we need to minimize >the number of eggs or not. It definitely seems nice to have smaller >subsets of functionality -- from the point of being able to make things >stable, managing their APIs, managing cross-component dependencies, and >from the user update size viewpoint. But are we paying a performance >penalty for going too small in scope with our eggs? I suggest you measure what you're concerned about. At one point, I did some timing tests that suggested that if you put the entire Cheeseshop on sys.path as zipped eggs, you might increase Python's startup time by a second or two. But your mileage may vary, and the Cheeseshop has increased a lot in size since then. ;) By the way, the long term plan for setuptools is that it should be able to install things the "old fashioned" way and still be able to manage them, using an installation manifest inside the .egg-info directories. In that way, you'd have all the benefits of separate distribution and a managed installation, as well as the benefits of having only one directory on sys.path. But I haven't done any work on implementing this yet. (Actually, now that I think of it, somebody could probably create a zc.buildout recipe to install eggs in an unpacked fashion (and let buildout handle uninstallation). The tricky part would be namespace package __init__ modules, since multiple eggs can share responsibility for the __init__, and this might confuse zc.buildout.) _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
