On Fri, Apr 23, 2010 at 9:23 AM, David Cournapeau <[email protected]> wrote: > On Fri, Apr 23, 2010 at 2:03 PM, P.J. Eby <[email protected]> wrote: >> At 10:16 AM 4/23/2010 +0900, David Cournapeau wrote: >>> >>> In my case, it is not even the issue of many eggs (I always install >>> things with --single-version-externally-managed and I forbid any code >>> to write into easy_install.pth). Importing pkg_resources alone >>> (python -c "import pkg_resources") takes half a second on my netbook. >> >> I find that weird, to say the least. On my desktop just now, with a >> sys.path 79 entries long (including 41 .eggs), it's a "blink and you missed >> it" operation. I'm curious what the difference might be. >> >> (Running timeit -s 'import pkg_resources' 'reload(pkg_resources)' gives a >> timing result of 61.9 milliseconds for me.) > > I should re-emphasize that the half-second number was on a netbook, > which is a very weak machine on every account (CPU, memory size and > disk capabilities). But using pkg_resources for console_scripts in the > package I am working on made a big difference (more time in spent in > importing pkg_resources than everything else). Since we are talking > about import times, I guess the issue is the same as for namespace > packages. I have noticed this slow behavior on every machine I have > ever had my hands on, be it mine or someone else, on linux, windows or > mac os x. > > My (limited) understanding of pkg_resources is that is that it scales > linearly with the number of packages it is aware of, and that it needs > to scan a few directories for every package. Importing pkg_resources > causes many more syscalls than relatively big packages (~ 1000 for > python -c "", 3000 for importing one of numpy/wx/gtk, 6000 for > pkg_resources). Assuming those are unavoidable (and the current > namespace implementation in setuptools requires it, right ?), I don't > see a way to reduce that cost significantly,
There's a memory cache though, that probably makes it faster already. Now if we had a way to know that a directory tree hasn't changed on the system, a persistent cache will dramatically increase the work. Unfortunately I think this is impossible unless we watch them (and yet, this would be quite hard to implement). We can probably have a persistent cache for zip files though, because we can avoid to brows their content again if the zip file wasn't changed. For regular directories, I haven't profiled it, but the bottleneck is probably find_on_path(), the function that gets called for every directory in sys.path to look for .eggs. Now since the code mostly deals with strings work besides the I/O, maybe it could be reimplemented in C. I'd be very interested in speeding up this process, as we will have something similar in pkg_util once PEP 376 is accepted, Tarek > > cheers, > > David > _______________________________________________ > Distutils-SIG maillist - [email protected] > http://mail.python.org/mailman/listinfo/distutils-sig > -- Tarek Ziadé | http://ziade.org _______________________________________________ Distutils-SIG maillist - [email protected] http://mail.python.org/mailman/listinfo/distutils-sig
