On Fri, Jun 12, 2009 at 12:08 PM, Jeremy Hylton<[email protected]> wrote: > On Fri, Jun 12, 2009 at 11:45 AM, Jesse Noller<[email protected]> wrote: >> Really? Is this the worse thing ever? How many of us building heavily >> threaded I/O bound applications are truly hampered by this? Yes; this >> sucks for CPU bound applications, that's been known since the earth >> cooled. > > I'm not sure I understand how to distinguish between I/O bound threads > and CPU bound threads. If you've got a relatively simple > multi-threaded application like an HTTP fetcher with a thread pool > fetching a lot of urls, you're probably going to end up having more > than one thread with input to process at any instant. There's a ton > of Python code that executes when that happens. You've got a urllib > addinfourl wrapper, a httplib HTTPResponse (with read & _safe_read) > and a socket _fileobject. Heaven help you if you are using readline. > So I could image even this trivial I/O bound program having lots of > CPU contention. > > Jeremy >
Speaking as someone who does have lots of apps doing heavily threaded URL fetching (puts, gets, deletes) - the GIL ends up not bothering me, and does speed things up (but not as much as I'd like). I tend to push heavier data parsing off via multiprocessing, and stick to just threads for the GET/PUT/POSTS. I had one benchmark in PEP 371 which did url fetching (http://www.python.org/dev/peps/pep-0371/): cmd: python run_benchmarks.py url_get.py Importing url_get Starting tests ... non_threaded (1 iters) 0.124774 seconds threaded (1 threads) 0.120478 seconds processes (1 procs) 0.121404 seconds non_threaded (2 iters) 0.239574 seconds threaded (2 threads) 0.146138 seconds processes (2 procs) 0.138366 seconds non_threaded (4 iters) 0.479159 seconds threaded (4 threads) 0.200985 seconds processes (4 procs) 0.188847 seconds non_threaded (8 iters) 0.960621 seconds threaded (8 threads) 0.659298 seconds processes (8 procs) 0.298625 seconds For heavy http handling though, I rapidly move to using pycurl, rather than httplib, which of course brings a C module into play and allows me to sidestep some of the issues even more. Note that I'm not advocating/saying "things are fine as is" - I'm a pretty squeaky wheel when it comes to making this space (threads, the GIL, etc) better. Right now, my biggest thing to watch is unladen-swallow in this regard, as I don't see a lot of movement for this in core today. However, that being said; I think people get hung up on the GIL before even knowing if it does affect their application, and are too quick to discount python threads as a whole before figuring it out for themselves. jesse _______________________________________________ concurrency-sig mailing list [email protected] http://mail.python.org/mailman/listinfo/concurrency-sig
