One word: "overhead"
30% does seem a little bit high... but when you have 2 threads
you are going from basically zero overhead
to whatever it takes to try to get both threads to run...
(as you say GIL passes in/out... OS cost to schedule your
thread...etc) consider that this might be just enough
to bump a nice tight processor cache into a more
"thrashy" one...
On Mar 25, 2009, at 5:55 AM, Mads Darø Kristensen wrote:
Thank you for the explanation. That does make sense, because when I
measure the time spent performing the tasklets it takes more than
twice as long when performing two (identical) tasklets, so the added
30% is definitely not being spent on my number crunching tasklets.
I'll be reimplementing my execution environment using processes
anytime soon :-)
Best regards
Mads
Kristján Valur Jónsson wrote:
There are probably two reasons for this.
a) The GIL is released for the duration of any time-consuming
system call. This allows time for another thread to step in.
b) Aquiring the lock, at least on windows, will cause the thread to
do a few hundred trylock spins. In fact, this should be removed on
windows since it is not appropriate for a resource normally
occupied...
The effect of b is probably small. But a) is real and it would
suggest that a large portion of the time is spent outside of
python, performing system calls, such as send() and recv(), hardly
surprising.
K
-----Original Message-----
From: [email protected] [mailto:[email protected]
] On Behalf Of Mads Darø Kristensen
Sent: 25. mars 2009 08:29
To: stackless list
Subject: Re: [Stackless] question on preemtive scheduling semantics
Replying to myself here...
I have now tested it more thoroughly, and I get some surprising
results
(surprising to me at least). When running a single-threaded stackless
scheduler I get the expected 100% CPU load when i try to stress it,
but
running two threads on my dual core machine yielded a CPU load of
approximately 130%? What gives?
Seeing as the global interpreter lock should get in the way of
utilizing
more than one core shouldn't I be seeing that using two threads
(and two
schedulers) would yield the same 100% CPU load as using a single
thread did?
I'm not here to start another "global interpreter lock" discussion,
so
if there are obvious answers to be found in the mailing list archives
just tell me to RTFM :)
Best regards
Mads
Mads Darø Kristensen wrote:
Hi Jeff.
Jeff Senn wrote:
Hm. Do you mean "thread" or "process"? Because of the GIL you
cannot use
threads to overlap python
execution within one interpreter (this has been discussed at great
length here many times...) --
depending on how you are measuring, perhaps you would aspire to get
200%, 400% ...etc for multicore....
I mean thread, not process. And what I meant with 100% utilization
was
200% for the 2-core Mac I tested on... At least that was what I
thought
I saw - I'll have to test that again some time :-)
Best regards
Mads
_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless
--
Med venlig hilsen / Best regards
Mads D. Kristensen
Blog: http://kedeligdata.blogspot.com/
Work homepage: http://www.daimi.au.dk/~madsk
_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless
_______________________________________________
Stackless mailing list
[email protected]
http://www.stackless.com/mailman/listinfo/stackless