subject:"\[Python\-Dev\] Re\: Python multithreading without the GIL"

[Python-Dev] Re: Python multithreading without the GIL

2023-07-31 Thread Reza Roboubi

I'm suspicious of pyperformance testing for this reason:

The point of Python is operating OK despite GIL because "most of the time is 
spent in 'external' libraries."

Pyperformance tests "typical" python performance where supposedly most tests 
are "ok" despite GIL. You need multithreading in atypical situations which may 
involve a lot of raw-python-object "thrashing," with high ref-counting, locks, 
etc. How do we know that pyperformance actually tests these cases well?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UA4VX3JFRRHI6TXPEBCZRWSPDOWQWM2G/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2022-04-25 Thread Guido van Rossum

On Mon, Apr 25, 2022 at 2:33 PM Brett Cannon  wrote:

>
>
> On Sat, Apr 23, 2022 at 8:31 AM  wrote:
>
>> Hello all,
>>
>> I am very excited about a future multithreaded Python. I managed to
>> postpone some rewrites in the company I work for Rust/Go, precisely because
>> of the potential to have a Python solution in the medium term.
>>
>> I was wondering. Is Sam Gross' nogil merge being seriously considered by
>> the core Python team?
>>
>
> Yes, although we have no timeline as to when we will make a decision about
> whether we will accept it or not.
>

We haven't even discussed a *process* for how to decide. OTOH, in two days
at the Language Summit at PyCon, Sam will give a presentation to the core
devs present (which is far from all of us, alas).

> The last update we had on the work was Sam was upstreaming the performance
> improvements he made that were not nogil-specific. The nogil work was also
> being updated for the `main` branch. Once that's all done we will probably
> start a serious discussion as to whether we want to accept it.
>

It's possible that I've missed those code reviews, but I haven't seen a
single PR from Sam, nor have there been any messages from him in this forum
or in any other forums I'm monitoring. I'm hoping that the Language Summit
will change this, but I suspect that there aren't that many perf
improvements in Sam's work that are easily separated from the nogil work.
(To be sure, Christian Heimes seems to have made progress with introducing
mimalloc, which is one of Sam's dependencies, but AFAIK that work hasn't
been finished yet.)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/2LGLMS4JQTB2AFZIS25MBITJZ4TQ2WFF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2022-04-25 Thread Brett Cannon

On Sat, Apr 23, 2022 at 8:31 AM  wrote:

> Hello all,
>
> I am very excited about a future multithreaded Python. I managed to
> postpone some rewrites in the company I work for Rust/Go, precisely because
> of the potential to have a Python solution in the medium term.
>
> I was wondering. Is Sam Gross' nogil merge being seriously considered by
> the core Python team?
>

Yes, although we have no timeline as to when we will make a decision about
whether we will accept it or not. The last update we had on the work was
Sam was upstreaming the performance improvements he made that were not
nogil-specific. The nogil work was also being updated for the `main`
branch. Once that's all done we will probably start a serious discussion as
to whether we want to accept it.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KCATAW2AXWFBGYCVN3IQPIKU2QN2GAA3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2022-04-23 Thread brataodream

Hello all,

I am very excited about a future multithreaded Python. I managed to postpone 
some rewrites in the company I work for Rust/Go, precisely because of the 
potential to have a Python solution in the medium term.

I was wondering. Is Sam Gross' nogil merge being seriously considered by the 
core Python team?
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SNSLKDHCE3J2VQHZCWFHNPDAEWGKEWN6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Skip Montanaro

Sam> I think the performance difference is because of different
versions of NumPy.

Thanks all for the help/input/advice. It never occurred to me that two
relatively recent versions of numpy would differ so much for the
simple tasks in my script (array creation & transform). I confirmed
this by removing 1.21.3 and installing 1.19.4 in my 3.9 build.

I also got a little bit familiar with pyperf, and as a "stretch" goal
completely removed random numbers and numpy from my script. (Took me a
couple tries to get my array init and transposition correct. Let's
just say that it's been awhile. Numpy *was* a nice crutch...) With no
trace of numpyleft I now get identical results for single-threaded
matrix multiply (a size==1, b size==2):

3.9: matmul: Mean +- std dev: 102 ms +- 1 ms
nogil: matmul: Mean +- std dev: 103 ms +- 2 ms

and a nice speedup for multi-threaded (a size==3, b size=6, nthreads=3):

3.9: matmul_t: Mean +- std dev: 290 ms +- 13 ms
nogil: matmul_t: Mean +- std dev: 102 ms +- 3 ms

Sam> I'll update the version of NumPy for "nogil" Python if I have
some time this week.

I think it would be sufficient to alert users to the 1.19/1.21
performance differences and recommend they force install 1.19 in
non-nogil builds for testing purposes. Hopefully adding a simple note
to your README will take less time than porting your changes to numpy
1.21 and adjusting your build configs/scripts.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/5RXRTNNCYBCILMVATHODFGAZ5ZEQXRZI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Christopher Barker

> I think the performance difference is because of different versions of
> NumPy.
>

Good reason to leave numpy completely out of it. Unless you want to test
 nogil’s performance effects on numpy code — an interesting exercise in
itself.

Also — sorry I didn’t look at your code before, but you really want to keep
the generation of large random arrays out of your benchmark if you can. I
suspect that’s what’s changed in numpy versions.

In any case, do time the random number generation…

-CHB



Python 3.9 installs NumPy 1.21.3 by default for "pip install numpy". I've
> only built and packaged NumPy 1.19.4 for "nogil" Python. There are
> substantial performance differences between the two NumPy builds for this
> matmul script.
>
> With NumPy 1.19.4, I get practically the same results for both Python
> 3.9.2 and "nogil" Python for "time python3 matmul.py 0 10".
>
> I'll update the version of NumPy for "nogil" Python if I have some time
> this week.
>
> Best,
> Sam
>
> On Sun, Oct 31, 2021 at 5:46 PM Skip Montanaro 
> wrote:
>
>> > Remember that py stone is a terrible benchmark.
>>
>> I understand that. I was only using it as a spot check. I was surprised
>> at how much slower my (threaded or unthreaded) matrix multiply was on nogil
>> vs 3.9+. I went into it thinking I would see an improvement. The
>> Performance section of Sam's design document starts:
>>
>> As mentioned above, the no-GIL proof-of-concept interpreter is about 10%
>> faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.
>>
>>
>> so it didn't occur to me that I'd be looking at a slowdown, much less by
>> as much as I'm seeing.
>>
>> Maybe I've somehow stumbled on some instruction mix for which the nogil
>> VM is much worse than the stock VM. For now, I prefer to think I'm just
>> doing something stupid. It certainly wouldn't be the first time.
>>
>> Skip
>>
>> P.S. I suppose I should have cc'd Sam when I first replied to this
>> thread, but I'm doing so now. I figured my mistake would reveal itself
>> early on. Sam, here's my first post about my little "project."
>> https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
>>
>>
>> --
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/VLSAMFORVMEIQVH3UH6LOK3OA3GL7C6J/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-11-01 Thread Sam Gross

Hi Skip,

I think the performance difference is because of different versions of
NumPy. Python 3.9 installs NumPy 1.21.3 by default for "pip install numpy".
I've only built and packaged NumPy 1.19.4 for "nogil" Python. There are
substantial performance differences between the two NumPy builds for this
matmul script.

With NumPy 1.19.4, I get practically the same results for both Python 3.9.2
and "nogil" Python for "time python3 matmul.py 0 10".

I'll update the version of NumPy for "nogil" Python if I have some time
this week.

Best,
Sam

On Sun, Oct 31, 2021 at 5:46 PM Skip Montanaro 
wrote:

> > Remember that py stone is a terrible benchmark.
>
> I understand that. I was only using it as a spot check. I was surprised at
> how much slower my (threaded or unthreaded) matrix multiply was on nogil vs
> 3.9+. I went into it thinking I would see an improvement. The Performance
> section of Sam's design document starts:
>
> As mentioned above, the no-GIL proof-of-concept interpreter is about 10%
> faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.
>
>
> so it didn't occur to me that I'd be looking at a slowdown, much less by
> as much as I'm seeing.
>
> Maybe I've somehow stumbled on some instruction mix for which the nogil VM
> is much worse than the stock VM. For now, I prefer to think I'm just doing
> something stupid. It certainly wouldn't be the first time.
>
> Skip
>
> P.S. I suppose I should have cc'd Sam when I first replied to this
> thread, but I'm doing so now. I figured my mistake would reveal itself
> early on. Sam, here's my first post about my little "project."
> https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
>
>
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/W23EPICXG3RVOMMCVSM3FVOEN2U3LNM3/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Skip Montanaro

> Remember that py stone is a terrible benchmark.

I understand that. I was only using it as a spot check. I was surprised at
how much slower my (threaded or unthreaded) matrix multiply was on nogil vs
3.9+. I went into it thinking I would see an improvement. The Performance
section of Sam's design document starts:

As mentioned above, the no-GIL proof-of-concept interpreter is about 10%
faster than CPython 3.9 (and 3.10) on the pyperformance benchmark suite.


so it didn't occur to me that I'd be looking at a slowdown, much less by as
much as I'm seeing.

Maybe I've somehow stumbled on some instruction mix for which the nogil VM
is much worse than the stock VM. For now, I prefer to think I'm just doing
something stupid. It certainly wouldn't be the first time.

Skip

P.S. I suppose I should have cc'd Sam when I first replied to this
thread, but I'm doing so now. I figured my mistake would reveal itself
early on. Sam, here's my first post about my little "project."
https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CGT4EMEA7JEH6CIRTB7Z5UUIKWKREAMF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Guido van Rossum

Remember that py stone is a terrible benchmark. It only exercises a few
byte codes and a modern CPU’s caching and branch prediction make minced
meat of those. Sam wrote a whole new register-based VM so perhaps that
exercises different byte codes.

On Sun, Oct 31, 2021 at 05:19 Skip Montanaro 
wrote:

> Skip> 1. I use numpy arrays filled with random values, and the output
> array is also a numpy array. The vector multiplication is done in a simple
> for loop in my vecmul() function.
>
> CHB> probably doesn't make a difference for this exercise, but numpy
> arrays make lousy replacements for a  regular list ...
>
> Yeah, I don't think it should matter here. Both versions should be
> similarly penalized.
>
> Skip> The results were confusing, so I dredged up a copy of pystone to
> make sure I wasn't missing anything w.r.t. basic execution performance. I'm
> still confused, so will keep digging.
>
> CHB> I'll be interested to see what you find out :-)
>
> I'm still scratching my head. I was thinking there was something about the
> messaging between the main and worker threads, so I tweaked matmul.py to
> accept 0 as a number of threads. That means it would call matmul which
> would call vecmul directly. The original queue-using versions were simply
> renamed to matmul_t and vecmul_t.
>
> I am still confused. Here are the pystone numbers, nogil first, then the
> 3.9 git tip:
>
> (base) nogil_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 5 passes = 0.137658
> This machine benchmarks at 363218 pystones/second
>
> (base) 3.9_build% ./bin/python3 ~/cmd/pystone.py
> Pystone(1.1.1) time for 5 passes = 0.207102
> This machine benchmarks at 241427 pystones/second
>
> That suggests nogil is indeed a definite improvement over vanilla 3.9.
> However, here's a quick nogil v 3.9 timing run of my matrix multiplication,
> again, nogil followed by 3.9 tip:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.314s
> user 0m9.302s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.918s
> user 0m5.180s
> sys 0m0.380s
>
> What's up with that? Suddenly nogil is much slower than 3.9 tip. No
> threads are in use. I thought perhaps the nogil run somehow didn't use
> Sam's VM improvements, so I disassembled the two versions of vecmul. I
> won't bore you with the entire dis.dis output, but suffice it to say that
> Sam's instruction set appears to be in play:
>
> (base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3
> Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03)
> [GCC 9.3.0] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import matmul, dis
> >>> dis.dis(matmul.vecmul)
>  26   0 FUNC_HEADER 11 (11)
>
>  28   2 LOAD_CONST   2 (0.0)
>   4 STORE_FAST   2 (result)
>
>  29   6 LOAD_GLOBAL  3 254 ('len'; 254)
>   9 STORE_FAST   8 (.t3)
>  11 COPY   9 0 (.t4 <- a)
>  14 CALL_FUNCTION  9 1 (.t4 to .t5)
>  18 STORE_FAST   5 (.t0)
> ...
>
> So I unboxed the two numpy arrays once and used lists of lists for the
> actual work. The nogil version still performs worse by about a factor of
> two:
>
> (base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m9.537s
> user 0m9.525s
> sys 0m0.012s
>
> (base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
> a: (160, 625) b: (625, 320) result: (160, 320) -> 51200
>
> real 0m4.836s
> user 0m5.109s
> sys 0m0.365s
>
> Still scratching my head and am open to suggestions about what to try
> next. If anyone is playing along from home, I've updated my script:
>
> https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d
>
> I'm sure there are things I could have done more efficiently, but I would
> think both Python versions would be similarly penalized by dumb s**t I've
> done.
>
> Skip
>
>
> Skip
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/SSLCURZJD5NLAYN5LFEZ4RJWU5YPQX65/
Code of Conduct:

[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Skip Montanaro

Skip> 1. I use numpy arrays filled with random values, and the output array
is also a numpy array. The vector multiplication is done in a simple for
loop in my vecmul() function.

CHB> probably doesn't make a difference for this exercise, but numpy arrays
make lousy replacements for a  regular list ...

Yeah, I don't think it should matter here. Both versions should be
similarly penalized.

Skip> The results were confusing, so I dredged up a copy of pystone to make
sure I wasn't missing anything w.r.t. basic execution performance. I'm
still confused, so will keep digging.

CHB> I'll be interested to see what you find out :-)

I'm still scratching my head. I was thinking there was something about the
messaging between the main and worker threads, so I tweaked matmul.py to
accept 0 as a number of threads. That means it would call matmul which
would call vecmul directly. The original queue-using versions were simply
renamed to matmul_t and vecmul_t.

I am still confused. Here are the pystone numbers, nogil first, then the
3.9 git tip:

(base) nogil_build% ./bin/python3 ~/cmd/pystone.py
Pystone(1.1.1) time for 5 passes = 0.137658
This machine benchmarks at 363218 pystones/second

(base) 3.9_build% ./bin/python3 ~/cmd/pystone.py
Pystone(1.1.1) time for 5 passes = 0.207102
This machine benchmarks at 241427 pystones/second

That suggests nogil is indeed a definite improvement over vanilla 3.9.
However, here's a quick nogil v 3.9 timing run of my matrix multiplication,
again, nogil followed by 3.9 tip:

(base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m9.314s
user 0m9.302s
sys 0m0.012s

(base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m4.918s
user 0m5.180s
sys 0m0.380s

What's up with that? Suddenly nogil is much slower than 3.9 tip. No threads
are in use. I thought perhaps the nogil run somehow didn't use Sam's VM
improvements, so I disassembled the two versions of vecmul. I won't bore
you with the entire dis.dis output, but suffice it to say that Sam's
instruction set appears to be in play:

(base) nogil_build% PYTHONPATH=$HOME/tmp ./bin/python3/python3
Python 3.9.0a4+ (heads/nogil:b0ee2c4740, Oct 30 2021, 16:23:03)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import matmul, dis
>>> dis.dis(matmul.vecmul)
 26   0 FUNC_HEADER 11 (11)

 28   2 LOAD_CONST   2 (0.0)
  4 STORE_FAST   2 (result)

 29   6 LOAD_GLOBAL  3 254 ('len'; 254)
  9 STORE_FAST   8 (.t3)
 11 COPY   9 0 (.t4 <- a)
 14 CALL_FUNCTION  9 1 (.t4 to .t5)
 18 STORE_FAST   5 (.t0)
...

So I unboxed the two numpy arrays once and used lists of lists for the
actual work. The nogil version still performs worse by about a factor of
two:

(base) nogil_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m9.537s
user 0m9.525s
sys 0m0.012s

(base) 3.9_build% time ./bin/python3 ~/tmp/matmul.py 0 10
a: (160, 625) b: (625, 320) result: (160, 320) -> 51200

real 0m4.836s
user 0m5.109s
sys 0m0.365s

Still scratching my head and am open to suggestions about what to try next.
If anyone is playing along from home, I've updated my script:

https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d

I'm sure there are things I could have done more efficiently, but I would
think both Python versions would be similarly penalized by dumb s**t I've
done.

Skip


Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4JSJFOWQPZHUAUGDVRGIU6LTF7QNXTLD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-31 Thread Christopher Barker

On Fri, Oct 29, 2021 at 6:10 AM Skip Montanaro 
wrote:

> 1. I use numpy arrays filled with random values, and the output array is
> also a numpy array. The vector multiplication is done in a simple for loop
> in my vecmul() function.
>

probably doesn't make a difference for this exercise, but numpy arrays make
lousy replacements for a  regular list -- i.e. as a container alone. The
issue is that floats need to be "boxed" and "unboxed" as you put them in
and pull them out of an array. whereas with lists, they float objects
themselves are already there.

OK, maybe not as bad as I remember. but not great:

In [61]: def multiply(vect, scalar, out):
...: """
...: multiply all the elements in vect by a scalar in place
...: """
...: for i, val in enumerate(vect):
...: out[i] = val * scalar
...:

In [62]: arr = np.random.random((10,))

In [63]: arrout = np.zeros_like(arr)

In [64]: l = list(arr)

In [65]: lout = [None] * len(l)

In [66]: %timeit multiply(arr, 1.1, arrout)
19.3 ms ± 125 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [67]: %timeit multiply(l, 1.1, lout)
12.8 ms ± 83.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

> That said, I have now run my example code using both PYTHONGIL=0 and
PYTHONGIL=1 of Sam's nogil branch as well as the following other Python3
versions:

* Conda Python3 (3.9.7)
* /usr/bin/python3 (3.9.1 in my case)
* 3.9 branch tip (3.9.7+)

The results were confusing, so I dredged up a copy of pystone to make sure
I wasn't missing anything w.r.t. basic execution performance. I'm still
confused, so will keep digging.

I'll be interested to see what you find out :-)

It would also be fun to see David Beezley’s example from his seminal talk:
>
> https://youtu.be/ph374fJqFPE
>

Thanks, I'll take a look when I get a chance

That may not be the best source of the talk -- just the one I found first
:-)

-CHB

-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GY7RWKFOPQFGTGD7IUN5JS6FYNXYM22I/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-29 Thread Skip Montanaro

>
> Did you try running  the same code with stock Python?
>
> One reason I ask is the IIUC, you are using numpy for the individual
>  vector operations, and numpy already releases the GIL in some
> circumstances.
>

I had not run the same code with stock Python (but see below). Also, I only
used numpy for two bits:

1. I use numpy arrays filled with random values, and the output array is
also a numpy array. The vector multiplication is done in a simple for loop
in my vecmul() function.

2. Early on I compared my results with the result of numpy.matmul just to
make sure I had things right.

That said, I have now run my example code using both PYTHONGIL=0 and
PYTHONGIL=1 of Sam's nogil branch as well as the following other Python3
versions:

* Conda Python3 (3.9.7)
* /usr/bin/python3 (3.9.1 in my case)
* 3.9 branch tip (3.9.7+)

The results were confusing, so I dredged up a copy of pystone to make sure
I wasn't missing anything w.r.t. basic execution performance. I'm still
confused, so will keep digging.

It would also be fun to see David Beezley’s example from his seminal talk:
>
> https://youtu.be/ph374fJqFPE
>

Thanks, I'll take a look when I get a chance. Might give me the excuse I
need to wake up extra early and tag along with Dave on an early morning
bike ride.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YZYJIDFH6Y3YCD3LCBQPRDQXN2JGJA7N/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-28 Thread Christopher Barker

Thanks  Skip — nice to see some examples.

Did you try running  the same code with stock Python?

One reason I ask is the IIUC, you are using numpy for the individual
 vector operations, and numpy already releases the GIL in some
circumstances.

It would also be fun to see David Beezley’s example from his seminal talk:


https://youtu.be/ph374fJqFPE

-CHB



On Thu, Oct 28, 2021 at 3:55 AM Skip Montanaro 
wrote:

> Guido> To be clear, Sam’s basic approach is a bit slower for
> single-threaded code, and he admits that. But to sweeten the pot he has
> also applied a bunch of unrelated speedups that make it faster in general,
> so that overall it’s always a win. But presumably we could upstream the
> latter easily, separately from the GIL-freeing part.
>
> Something just occurred to me. If you upstream all the other goodies
> (register VM, etc), when the time comes to upstream the no-GIL parts won't
> the complaint then be (again), "but it's slower for single-threaded
> code!" ? ;-)
>
> Onto other things. For about as long as I can remember, the biggest knock
> against Python was, "You can never do any serious multi-threaded
> programming with it. It has this f**king GIL!" I know that attempts to
> remove it have been made multiple times, beginning with (I think) Greg
> Smith in the 1.4 timeframe. In my opinion, Sam's work finally solves the
> problem.
>
> Not being a serious parallel programming person (I have used
> multi-threading a bit in Python, but only for obviously I/O-bound tasks), I
> thought it might be instructive — for me, at least — to kick the no-GIL
> tires a bit. Not having any obvious application in mind, I decided to
> implement a straightforward parallel matrix multiply. (I think I wrote
> something similar back in the mid-80s in a now defunct Smalltalk-inspired
> language while at GE.) Note that this was just for my own edification. I
> have no intention of trying to supplant numpy.matmul() or anything like
> that. It splits up the computation in the most straightforward (to me)
> way, handing off the individual vector multiplications to a variable
> sized thread pool. The code is here:
>
> https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d
>
> Here is a graph of some timings. My machine is a now decidedly
> long-in-the-tooth Dell Precision 5520 with a 7th Gen Core i7 processor
> (four cores + hyperthreading). The data for the graph come from the
> built-in bash time(1) command. As expected, wall clock time drops as you
> increase the number of cores until you reach four. After that, nothing
> improves, since the logical HT cores don't actually have their own ALU
> (just instruction fetch/decode I think). The slope of the real time
> improvement from two cores to four isn't as great as one to two, probably
> because I wasn't careful about keeping the rest of the system quiet. It was
> running my normal mix, Brave with many open tabs + Emacs. I believe I used
> A=240x3125, B=3125x480, giving a 240x480 result, so 15200 vector multiplies.
> .
>
> [image: matmul.png]
>
> All-in-all, I think Sam's effort is quite impressive. I got things going
> in fits and starts, needing a bit of help from Sam and Vadym Stupakov
> to get the modified numpy implementation (crosstalk between my usual Conda
> environment and the no-GIL stuff). I'm sure there are plenty of problems
> yet to be solved related to extension modules, but I trust smarter people
> than me can solve them without a lot of fuss. Once nogil is up-to-date with
> the latest 3.9 release I hope these changes can start filtering into main.
> Hopefully that means a 3.11 release. In fact, I'd vote for pushing back the
> usual release cycle to accommodate inclusion. Sam has gotten this so close
> it would be a huge disappointment to abandon it now. The problems faced at
> this point would have been amortized over years of development if the GIL
> had been removed 20 years ago. I say go for it.
>
> Skip
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CR2B3H2WKE6CEHUT22P263F2F4L7F3FU/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-28 Thread Skip Montanaro

Guido> To be clear, Sam’s basic approach is a bit slower for
single-threaded code, and he admits that. But to sweeten the pot he has
also applied a bunch of unrelated speedups that make it faster in general,
so that overall it’s always a win. But presumably we could upstream the
latter easily, separately from the GIL-freeing part.

Something just occurred to me. If you upstream all the other goodies
(register VM, etc), when the time comes to upstream the no-GIL parts won't
the complaint then be (again), "but it's slower for single-threaded code!"
? ;-)

Onto other things. For about as long as I can remember, the biggest knock
against Python was, "You can never do any serious multi-threaded
programming with it. It has this f**king GIL!" I know that attempts to
remove it have been made multiple times, beginning with (I think) Greg
Smith in the 1.4 timeframe. In my opinion, Sam's work finally solves the
problem.

Not being a serious parallel programming person (I have used
multi-threading a bit in Python, but only for obviously I/O-bound tasks), I
thought it might be instructive — for me, at least — to kick the no-GIL
tires a bit. Not having any obvious application in mind, I decided to
implement a straightforward parallel matrix multiply. (I think I wrote
something similar back in the mid-80s in a now defunct Smalltalk-inspired
language while at GE.) Note that this was just for my own edification. I
have no intention of trying to supplant numpy.matmul() or anything like that.
It splits up the computation in the most straightforward (to me) way,
handing off the individual vector multiplications to a variable sized
thread pool. The code is here:

https://gist.github.com/smontanaro/80f788a506d2f41156dae779562fd08d

Here is a graph of some timings. My machine is a now decidedly
long-in-the-tooth Dell Precision 5520 with a 7th Gen Core i7 processor
(four cores + hyperthreading). The data for the graph come from the
built-in bash time(1) command. As expected, wall clock time drops as you
increase the number of cores until you reach four. After that, nothing
improves, since the logical HT cores don't actually have their own ALU
(just instruction fetch/decode I think). The slope of the real time
improvement from two cores to four isn't as great as one to two, probably
because I wasn't careful about keeping the rest of the system quiet. It was
running my normal mix, Brave with many open tabs + Emacs. I believe I used
A=240x3125, B=3125x480, giving a 240x480 result, so 15200 vector multiplies.
.

[image: matmul.png]

All-in-all, I think Sam's effort is quite impressive. I got things going in
fits and starts, needing a bit of help from Sam and Vadym Stupakov
to get the modified numpy implementation (crosstalk between my usual Conda
environment and the no-GIL stuff). I'm sure there are plenty of problems
yet to be solved related to extension modules, but I trust smarter people
than me can solve them without a lot of fuss. Once nogil is up-to-date with
the latest 3.9 release I hope these changes can start filtering into main.
Hopefully that means a 3.11 release. In fact, I'd vote for pushing back the
usual release cycle to accommodate inclusion. Sam has gotten this so close
it would be a huge disappointment to abandon it now. The problems faced at
this point would have been amortized over years of development if the GIL
had been removed 20 years ago. I say go for it.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/WBLU6PZ2RDPEMG3ZYBWSAXUGXCJNFG4A/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-18 Thread Skip Montanaro

Mohamed> I love everything about this - but I expect some hesitancy
due to this "Multithreaded programs are prone to concurrency bugs.".

Paul> The way I see it, the concurrency model to be used is selected
by developers. They can choose between ...

I think the real intent of the statement Mohamed quoted is that just
because your program works in a version of Python with the GIL doesn't
mean it will work unchanged in a GIL-free world. As we all know, the
GIL can hide a multitude of sins. I could be paraphrasing Tim Peters
here without realizing it explicitly. It kinda sounds like something
he might say.

Skip
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4OWK2DQKQOZZDPNWA7KC3NAUTWOBFOND/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-18 Thread Paul Bryan

The way I see it, the concurrency model to be used is selected by
developers. They can choose between multi-threading, multi-process, or
asyncio, or even a hybrid. If developers select multithreading, then
they carry the burden of ensuring mutual exclusion and avoiding race
conditions, dead locks, live locks, etc.


On Mon, 2021-10-18 at 13:17 +, Mohamed Koubaa wrote:
> I love everything about this - but I expect some hesitancy due to
> this "Multithreaded programs are prone to concurrency bugs.".
> 
> If there is significant pushback, I have one suggestion:
> 
> Would it be helpful to think of the python concurrency mode as a
> property of interpreters?
> `interp = interpreters.create(concurrency_mode=interpreters.GIL)`
> or 
> `interp = interpreters.create(concurrency_mode=interpreters.NOGIL)`
> 
> and subsequently python _environments_ can make different choices
> about what to use for the 0th interpreter, via some kind of
> configuration.
> Python modules can declare which concurrency modes they supports. 
> Future concurrency modes that address specific use cases could be
> added.
> 
> This would allow python environments who would rather not audit their
> code for concurrency isuses to opt out, and allow incremental
> adoption.  I can't intuit whether this indirection would cause a
> performance problem in the C implementation or if there is some
> clever way to have different variants of relevant objects at compile
> time and switch between them based on the interpreter concurrency
> mode.
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ZUEWHEOW34MNHKOY2TLTFI4LHYJX4YDW/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/36ENPZV6W3NXIUS3TPU4MQ235B2IF5XF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-18 Thread Mohamed Koubaa

I love everything about this - but I expect some hesitancy due to this 
"Multithreaded programs are prone to concurrency bugs.".

If there is significant pushback, I have one suggestion:

Would it be helpful to think of the python concurrency mode as a property of 
interpreters?
`interp = interpreters.create(concurrency_mode=interpreters.GIL)`
or 
`interp = interpreters.create(concurrency_mode=interpreters.NOGIL)`

and subsequently python _environments_ can make different choices about what to 
use for the 0th interpreter, via some kind of configuration.
Python modules can declare which concurrency modes they supports.  Future 
concurrency modes that address specific use cases could be added.

This would allow python environments who would rather not audit their code for 
concurrency isuses to opt out, and allow incremental adoption.  I can't intuit 
whether this indirection would cause a performance problem in the C 
implementation or if there is some clever way to have different variants of 
relevant objects at compile time and switch between them based on the 
interpreter concurrency mode.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZUEWHEOW34MNHKOY2TLTFI4LHYJX4YDW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-13 Thread Simon Cross

> Still, I hope you at least enjoyed my enthusiasm!

I did!
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/BJFDVRCZMEDOHEMCCIJJP6NTX6HOGC5L/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-12 Thread Larry Hastings

Oops! Sorry everybody, I meant that to be off-list.

Still, I hope you at least enjoyed my enthusiasm!


/arry

On Tue, Oct 12, 2021, 12:55 Larry Hastings  wrote:

>
> (off-list)
>
>
> On 10/11/21 2:09 PM, Sam Gross wrote:
>
> The ccbench results look pretty good: about 18.1x speed-up on "pi
> calculation" and 19.8x speed-up on "regular expression" with 20 threads
> (turbo off). The latency and throughput results look good too.
>
>
> JESUS CHRIST
>
>
>
> */arry*
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/H4QIVWY7RB4A765FYH2JVKJM52V42B4U/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-12 Thread Larry Hastings



(off-list)


On 10/11/21 2:09 PM, Sam Gross wrote:
The ccbench results look pretty good: about 18.1x speed-up on "pi 
calculation" and 19.8x speed-up on "regular expression" with 20 
threads (turbo off). The latency and throughput results look good too.



JESUS CHRIST



//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CKRGEP64K4YCGV2KJEIO4NN7FASB5ZJA/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Barry Warsaw

Thank you Sam, this additional detail really helps me understand your proposal.

-Barry

> On Oct 11, 2021, at 12:06, Sam Gross  wrote:
> 
> I’m unclear what is actually retried.  You use this note throughout the 
> document, so I think it would help to clarify exactly what is retried and why 
> that solves the particular problem.  I’m confused because, is it the refcount 
> increment that’s retried or the entire sequence of steps (i.e. do you go back 
> and reload the address of the item)?  Is there some kind of waiting period 
> before the retry?  I would infer that if you’re retrying the refcount 
> incrementing, it’s because you expect subsequent retries to transition from 
> zero to non-zero, but is that guaranteed?  Are there possibilities of 
> deadlocks or race conditions?
> 
> The entire operation is retried (not just the refcount). For "dict", this 
> means going back to step 1 and reloading the version tag and 
> PyDictKeysObject. The operation can fail (and need to be retried) only when 
> some other thread is concurrently modifying the dict. The reader needs to 
> perform the checks (and retry) to avoid returning inconsistent data, such as 
> an object that was never in the dict. With the checks and retry, returning 
> inconsistent or garbage data is not possible.
> 
> The retry is performed after locking the dict, so the operation is retried at 
> most once -- the read operation can't fail when it holds the dict's lock 
> because the lock prevents concurrent modifications. It would have also been 
> possible to retry the operation in a loop without locking the dict, but I was 
> concerned about reader starvation. (In the doc I wrote "livelock", but 
> "reader starvation" is more accurate.) In particular, I was concerned that a 
> thread repeatedly modifying a dict might prevent other threads reading the 
> dict from making progress. I hadn't seen this in practice, but I'm aware that 
> reader starvation can be an issue for similar designs like Linux's seqlock. 
> Acquiring the dict's lock when retrying avoids the reader starvation issue.
> 
> Deadlock isn't possible because the code does not acquire any other locks 
> while holding the dict's lock. For example, the code releases the dict's lock 
> before calling Py_DECREF or PyObject_RichCompareBool.
> 
> The race condition question is a bit harder to answer precisely. Concurrent 
> reads and modifications of a dict won't cause the program to segfault, return 
> garbage data, or items that were never in the dict.
> 
> Regards,
> Sam
> 
> 



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/65OZPNMIZ2INHEIKTW7H65SORRYDK4D4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

I've updated the linked gists with the results from interpreters compiled
with PGO, so the numbers have slightly changed.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ESCEXN7HKL3GICHOHZMQTTHUDQN5WUYX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

On Mon, Oct 11, 2021 at 7:04 AM Antoine Pitrou  wrote:

> It's crude, but you can take a look at `ccbench` in the Tools directory.
>

Thanks, I wasn't familiar with this. The ccbench results look pretty good:
about 18.1x speed-up on "pi calculation" and 19.8x speed-up on "regular
expression" with 20 threads (turbo off). The latency and throughput results
look good too. With the GIL enabled (3.11), the compute intensive
background task increases latency and dramatically decreases throughput.
With the GIL disabled, latency remains low and throughput high.

Here are the full results for 20 threads without the GIL:
https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil,
20 threads)

Here are the results for 4 threads (the default) for comparison with
upstream:
https://gist.github.com/colesbury/8479ee0246558fa1ab0f49e4c01caeed (nogil,
4 threads)
https://gist.github.com/colesbury/c0b89f82e51779670265fb7c7cd37114
(3.11/b108db63e0, 4 threads)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WCMIVNQ6DNOTZUUX4EX43LF2VJPF4ALW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Thomas Grainger

I have a PR to remove this FAQ entry: 
https://github.com/python/cpython/pull/28886
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/YP3QZ7ZLMMQUAWVQRGAGNNETA6IDXP4P/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Pablo Galindo Salgado

As far as I understand we should get a smaller improvement on single thread
because some of the optimizations listed in this work are partially or
totally implemented.

This is excluding any non linear behaviour between the different
optimizations of course, and assuming that both versions yield the same
numbers.

On Mon, 11 Oct 2021, 20:28 Abdur-Rahmaan Janhangeer, 
wrote:

> When you mean "an order of magnitude less overhead than the current
> CPython implementation" do you mean compared with the main branch? We
> recently implemented already almost everything is listed in this paragraph:
>
> https://github.com/python/cpython/pull/27077
>
> We also pack some extra similar optimizations in this other PR, including
> stealing the frame arguments from python to python calls:
>
> https://github.com/python/cpython/pull/28488
>
> This could explain why the performance is closer to the current master
> branch as you indicate:
>
>
> This means that if we remove the GIL + add the 3.11 improvements we should
> get some more speed?
>
> (or if those are integrated in the POC?)
>
>
> Kind Regards,
>
> Abdur-Rahmaan Janhangeer
> about  | blog
> 
> github 
> Mauritius
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/MSNUMB5L3KS55HHAEMQZLFOM6JL3RL2B/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

On Mon, Oct 11, 2021 at 12:58 PM Thomas Grainger  wrote:

> Is D1.update(D2) still atomic with this implementation?
> https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
>

No. For example, another thread reading from the dict concurrently may
observe a partial update.

As Ronald Oussoren points out, dict.update isn't atomic in the general
case. CPython even includes some checks for concurrent modifications:
https://github.com/python/cpython/blob/2f92e2a590f0e5d2d3093549f5af9a4a1889eb5a/Objects/dictobject.c#L2582-L2586
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/IAOCDDCJ653NBED3G2J2YBWD7HHPFHT6/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Abdur-Rahmaan Janhangeer

When you mean "an order of magnitude less overhead than the current CPython
implementation" do you mean compared with the main branch? We recently
implemented already almost everything is listed in this paragraph:

https://github.com/python/cpython/pull/27077

We also pack some extra similar optimizations in this other PR, including
stealing the frame arguments from python to python calls:

https://github.com/python/cpython/pull/28488

This could explain why the performance is closer to the current master
branch as you indicate:


This means that if we remove the GIL + add the 3.11 improvements we should
get some more speed?

(or if those are integrated in the POC?)


Kind Regards,

Abdur-Rahmaan Janhangeer
about  | blog

github 
Mauritius
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UKO7C45UCVVIJUUBOSPZLUNR4CC3WNIL/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

>
> I’m unclear what is actually retried.  You use this note throughout the
> document, so I think it would help to clarify exactly what is retried and
> why that solves the particular problem.  I’m confused because, is it the
> refcount increment that’s retried or the entire sequence of steps (i.e. do
> you go back and reload the address of the item)?  Is there some kind of
> waiting period before the retry?  I would infer that if you’re retrying the
> refcount incrementing, it’s because you expect subsequent retries to
> transition from zero to non-zero, but is that guaranteed?  Are there
> possibilities of deadlocks or race conditions?


The entire operation is retried (not just the refcount). For "dict", this
means going back to step 1 and reloading the version tag and
PyDictKeysObject. The operation can fail (and need to be retried) only when
some other thread is concurrently modifying the dict. The reader needs to
perform the checks (and retry) to avoid returning inconsistent data, such
as an object that was never in the dict. With the checks and retry,
returning inconsistent or garbage data is not possible.

The retry is performed after locking the dict, so the operation is retried
at most once -- the read operation can't fail when it holds the dict's lock
because the lock prevents concurrent modifications. It would have also been
possible to retry the operation in a loop without locking the dict, but I
was concerned about reader starvation. (In the doc I wrote "livelock", but
"reader starvation" is more accurate.) In particular, I was concerned that
a thread repeatedly modifying a dict might prevent other threads reading
the dict from making progress. I hadn't seen this in practice, but I'm
aware that reader starvation can be an issue for similar designs like
Linux's seqlock. Acquiring the dict's lock when retrying avoids the reader
starvation issue.

Deadlock isn't possible because the code does not acquire any other
locks while holding the dict's lock. For example, the code releases the
dict's lock before calling Py_DECREF or PyObject_RichCompareBool.

The race condition question is a bit harder to answer precisely. Concurrent
reads and modifications of a dict won't cause the program to segfault,
return garbage data, or items that were never in the dict.

Regards,
Sam
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/D37MQCDRXRVLDVZ65G5BJPJ6QEPSVLI4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Ronald Oussoren via Python-Dev



> On 11 Oct 2021, at 18:58, Thomas Grainger  wrote:
> 
> Is D1.update(D2) still atomic with this implementation?  
> https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe
>  
> 
AFAIK this is already only atomic in specific circumstances, that are more 
limited than the FAQ appears to claim.

For dict.update to be atomic I’d expect that with two threads performing an 
update on the same keys you’d end up the update of either thread, but not a mix.

That is:

Thread 1:  d.update({“a”: 1, “b”: 1})
Thread 2:  d.update({“a”: 2, “b”: 2})

The result should have d[“a”] == d[“b”].

This can already end up with a mix of the two when “d” has keys that are 
objects that implement __eq__ in Python, because the interpreter could switch 
threads while interpreting __eq__. 

A pathological example:

# — start of script —

import threading
import time

stop = False
trigger = False
def runfunc():
while not stop:
if trigger:
d.update({"a": 2, "b": 2 })
print(d)
break

t = threading.Thread(target=runfunc)
t.start()


class X(str):
def __eq__(self, other):
if threading.current_thread() is t:
return str.__eq__(self, other)

global trigger
trigger = True
t.join()
return str.__eq__(self, other)

def __hash__(self):
return str.__hash__(self)


d = {X("b"):0}
print("before", d)
d.update({"a":1, "b": 1})
print("after", d)

stop = True
t.join()

# — end of script — 

This prints "after {'b': 1, 'a': 2}” on my machine.

Ronald


> 
> On Mon, 11 Oct 2021, 17:54 Sam Gross,  > wrote:
> On Fri, Oct 8, 2021 at 12:04 PM Nathaniel Smith  > wrote:
> I notice the fb.com  address -- is this a personal project or 
> something
> facebook is working on? what's the relationship to Cinder, if any?
> 
> It is a Facebook project, at least in the important sense that I work on it
> as an employee at Facebook. (I'm currently the only person working on it.)
> I keep in touch with some of the Cinder devs regularly and they've advised
> on the project, but otherwise the two projects are unrelated.
>  
> Regarding the tricky lock-free dict/list reads: I guess the more
> straightforward approach would be to use a plain ol' mutex that's
> optimized for this kind of fine-grained per-object lock with short
> critical sections and minimal contention, like WTF::Lock. Did you try
> alternatives like that? If so, I assume they didn't work well -- can
> you give more details?
> 
> I'm using WTF::Lock style locks for dict/list mutations. I did an experiment
> early on where I included locking around reads as well. I think it slowed down
> the pyperformance benchmarks by ~10% on average, but I can't find my notes
> so I plan to re-run the experiment.
> 
> Additionally, because dicts are used for things like global variables, I'd 
> expect
> that locks around reads prevent efficient scaling, but I haven't measured 
> this.
> 
> ___
> Python-Dev mailing list -- python-dev@python.org 
> 
> To unsubscribe send an email to python-dev-le...@python.org 
> 
> https://mail.python.org/mailman3/lists/python-dev.python.org/ 
> 
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/
>  
> 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
> 
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/5RKLUR2DYJ53OIRX74WVZCVRGW7VUXLF/
> Code of Conduct: http://python.org/psf/codeofconduct/

—

Twitter / micro.blog: @ronaldoussoren
Blog: https://blog.ronaldoussoren.net/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/4ECNMYHYOOPNL4XHE4GBB5AQN6NPX7QX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

On Fri, Oct 8, 2021 at 11:35 AM Chris Jerdonek 
wrote:

> Is it also slower even when running with PYTHONGIL=1? If it could be made
> the same speed for single-threaded code when running in GIL-enabled mode,
> that might be an easier intermediate target while still adding value.
>

Running with PYTHONGIL=1 is a bit less than 1% faster (on pyperformance)
than with PYTHONGIL=0. It might be possible to improve PYTHONGIL=1 by
another 1-2% by adding runtime checks for the GIL before attempting to lock
dicts and lists during mutations. I think further optimizations specific to
the PYTHONGIL=1 use case would be tricky.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6YLZMVKWI77SSNUV5XOGBSRY44KJ76UQ/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Thomas Grainger

Is D1.update(D2) still atomic with this implementation?
https://docs.python.org/3.11/faq/library.html#what-kinds-of-global-value-mutation-are-thread-safe

On Mon, 11 Oct 2021, 17:54 Sam Gross,  wrote:

> On Fri, Oct 8, 2021 at 12:04 PM Nathaniel Smith  wrote:
>
>> I notice the fb.com address -- is this a personal project or something
>> facebook is working on? what's the relationship to Cinder, if any?
>>
>
> It is a Facebook project, at least in the important sense that I work on it
> as an employee at Facebook. (I'm currently the only person working on it.)
> I keep in touch with some of the Cinder devs regularly and they've advised
> on the project, but otherwise the two projects are unrelated.
>
>
>> Regarding the tricky lock-free dict/list reads: I guess the more
>> straightforward approach would be to use a plain ol' mutex that's
>> optimized for this kind of fine-grained per-object lock with short
>> critical sections and minimal contention, like WTF::Lock. Did you try
>> alternatives like that? If so, I assume they didn't work well -- can
>> you give more details?
>>
>
> I'm using WTF::Lock style locks for dict/list mutations. I did an
> experiment
> early on where I included locking around reads as well. I think it slowed
> down
> the pyperformance benchmarks by ~10% on average, but I can't find my notes
> so I plan to re-run the experiment.
>
> Additionally, because dicts are used for things like global variables, I'd
> expect
> that locks around reads prevent efficient scaling, but I haven't measured
> this.
>
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5RKLUR2DYJ53OIRX74WVZCVRGW7VUXLF/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Sam Gross

On Fri, Oct 8, 2021 at 12:04 PM Nathaniel Smith  wrote:

> I notice the fb.com address -- is this a personal project or something
> facebook is working on? what's the relationship to Cinder, if any?
>

It is a Facebook project, at least in the important sense that I work on it
as an employee at Facebook. (I'm currently the only person working on it.)
I keep in touch with some of the Cinder devs regularly and they've advised
on the project, but otherwise the two projects are unrelated.

> Regarding the tricky lock-free dict/list reads: I guess the more
> straightforward approach would be to use a plain ol' mutex that's
> optimized for this kind of fine-grained per-object lock with short
> critical sections and minimal contention, like WTF::Lock. Did you try
> alternatives like that? If so, I assume they didn't work well -- can
> you give more details?
>

I'm using WTF::Lock style locks for dict/list mutations. I did an experiment
early on where I included locking around reads as well. I think it slowed
down
the pyperformance benchmarks by ~10% on average, but I can't find my notes
so I plan to re-run the experiment.

Additionally, because dicts are used for things like global variables, I'd
expect
that locks around reads prevent efficient scaling, but I haven't measured
this.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/V76ZRBM6UMGYU7FTNENMOOW7OYEFYQ5Q/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-11 Thread Antoine Pitrou

On Thu, 7 Oct 2021 15:52:56 -0400
Sam Gross  wrote:
> Hi,
> 
> I've been working on changes to CPython to allow it to run without the
> global interpreter lock. I'd like to share a working proof-of-concept that
> can run without the GIL. The proof-of-concept involves substantial changes
> to CPython internals, but relatively few changes to the C-API. It is
> compatible with many C extensions: extensions must be rebuilt, but usually
> require small or no modifications to source code. I've built compatible
> versions of packages from the scientific Python ecosystem, and they are
> installable through the bundled "pip".
> 
> Source code:
> https://github.com/colesbury/nogil
> 
> Design overview:
> https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

Impressive work!

Just for the record:
"""
It’s harder to measure aggregate multi-threaded performance because
there aren’t any standard multi-threaded Python benchmarks, but the new
interpreter addresses many of the use cases that failed to scale
efficiently.
"""

It's crude, but you can take a look at `ccbench` in the Tools directory.

Regards

Antoine.


___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/WRT7F2RHHCQ3N2TYEDC6JSIJ4T2ZM6F7/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-10 Thread Barry Warsaw

Congrats on this impressive work Sam.  I enjoyed the thorough write up of the 
design.  There’s one aspect that I don’t quite understand.  Maybe I missed the 
explanation.  For example:

```
• Load the address of the item
• Increment the reference count of the item, if it is non-zero 
(otherwise retry)
• Return the address of the item

(We typically acquire the per-collection mutex when retrying operations to 
avoid potential livelock issues.)
```

I’m unclear what is actually retried.  You use this note throughout the 
document, so I think it would help to clarify exactly what is retried and why 
that solves the particular problem.  I’m confused because, is it the refcount 
increment that’s retried or the entire sequence of steps (i.e. do you go back 
and reload the address of the item)?  Is there some kind of waiting period 
before the retry?  I would infer that if you’re retrying the refcount 
incrementing, it’s because you expect subsequent retries to transition from 
zero to non-zero, but is that guaranteed?  Are there possibilities of deadlocks 
or race conditions?

Can you go into some more detail (here or in the document) about how this works?

Cheers,
-Barry

> On Oct 7, 2021, at 12:52, Sam Gross  wrote:
> 
> Hi,
> 
> I've been working on changes to CPython to allow it to run without the global 
> interpreter lock. I'd like to share a working proof-of-concept that can run 
> without the GIL. The proof-of-concept involves substantial changes to CPython 
> internals, but relatively few changes to the C-API. It is compatible with 
> many C extensions: extensions must be rebuilt, but usually require small or 
> no modifications to source code. I've built compatible versions of packages 
> from the scientific Python ecosystem, and they are installable through the 
> bundled "pip".
> 
> Source code:
> https://github.com/colesbury/nogil
> 
> Design overview:
> https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit
> 
> My goal with the proof-of-concept is to demonstrate that removing the GIL is 
> feasible and worthwhile, and that the technical ideas of the project could 
> serve as a basis of such an effort.
> 
> I'd like to start a discussion about these ideas and gauge the community's 
> interest in this approach to removing the GIL.
> 
> Regards,
> Sam Gross
> colesb...@gmail.com / sgr...@fb.com
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-dev@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/
> Code of Conduct: http://python.org/psf/codeofconduct/



signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/GQOSFIG55U6LZKHO34DS55TGERPX5UVS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-09 Thread Chris Angelico

On Sun, Oct 10, 2021 at 2:31 PM Dan Stromberg  wrote:
>
>
> On Thu, Oct 7, 2021 at 9:10 PM Chris Angelico  wrote:
>>
>> Concurrency is *hard*. There's no getting around it, there's no
>> sugar-coating it. There are concepts that simply have to be learned,
>> and the failures can be extremely hard to track down. Instantiating an
>> object on the wrong thread can crash GTK, but maybe not immediately.
>> Failing to sleep in one thread results in other threads stalling. I
>> don't think any of this is changed by different modes (with the
>> exception of process-based parallelism, which fixes a lot of
>> concurrency at the cost of explicit IPC), and the more work
>> programmers want their code to do, the more likely that they'll run
>> into this.
>
>
> I'd like to encourage folks not to give up on looking for new, simpler 
> parallelism/concurrency formalisms.
>
> They're out there - consider how well bash does with its parallelism in 
> pipelines.

That's process-based parallelism with unidirectional byte-stream IPC.
It's incredibly specific, but also incredibly useful in its place :)

Simpler parallelism techniques are always possible if you don't need
much interaction between the processes. The challenge isn't making the
simple cases work, but making the harder ones efficient.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UPR7EIZTVKESA2ND4ISFXEYBZYLNSTCE/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-09 Thread Dan Stromberg

On Thu, Oct 7, 2021 at 9:10 PM Chris Angelico  wrote:

> Concurrency is *hard*. There's no getting around it, there's no
> sugar-coating it. There are concepts that simply have to be learned,
> and the failures can be extremely hard to track down. Instantiating an
> object on the wrong thread can crash GTK, but maybe not immediately.
> Failing to sleep in one thread results in other threads stalling. I
> don't think any of this is changed by different modes (with the
> exception of process-based parallelism, which fixes a lot of
> concurrency at the cost of explicit IPC), and the more work
> programmers want their code to do, the more likely that they'll run
> into this.
>

I'd like to encourage folks not to give up on looking for new, simpler
parallelism/concurrency formalisms.

They're out there - consider how well bash does with its parallelism in
pipelines.

The truly general ones may end up looking like Java, but I don't think they
have to be fully general to be useful.
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/AVDCKO3OH2SPU54VWAY5AY4HEQWDMPRC/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Guido van Rossum

On Fri, Oct 8, 2021 at 8:55 PM Sam Gross  wrote:

> the "nogil" interpreter stays within the same interpreter loop for many
> Python function calls, while upstream CPython
> recursively calls into _PyEval_EvalFrameDefault.
>

Not for much longer though. https://github.com/python/cpython/pull/28488

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/5AK2VA5FAFJ3NPYWRJCWNIIV77SEUIO2/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Sam Gross

On Fri, Oct 8, 2021 at 12:24 PM Pablo Galindo Salgado 
wrote:

> When you mean "an order of magnitude less overhead than the current
> CPython implementation" do you mean compared with the main branch? We
> recently implemented already almost everything is listed in this paragraph.
>

I think I wrote that in August when "current CPython" meant something
different from today :) I'll update it.

Thanks for the links to the PRs. I'll need to look at them more closely,
but one I think one remaining difference is that
the "nogil" interpreter stays within the same interpreter loop for many
Python function calls, while upstream CPython
recursively calls into _PyEval_EvalFrameDefault.

I've been using this mini-benchmark to measure the overhead of Python
function calls for various numbers of
arguments and keywords:
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/benchmarks/call_benchmark.py

For zero, two, and four argument functions, I get:
nogil (nogil/fb6aabed): 10ns, 14ns, 18ns
3.11 (main/b108db63): 47ns, 54ns, 63ns
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/UZKOY4Y3QWT76TCXJ3QXMEGRODN2DOGB/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Sam Gross

On Fri, Oct 8, 2021 at 12:55 PM Daniel Pope  wrote:

> I'm a novice C programmer, but I'm unsure about the safety of your
> thread-safe collections description.
>

The "list" class uses a slightly different strategy than "dict", which I
forgot about
when writing the design overview. List relies on the property that the
backing array
of a given list can only grow (never shrink) [1]. This is different from
upstream CPython.

Dict stores the capacity inside PyDictKeysObject (the backing array). The
capacity
never changes, so if you have a valid pointer to the PyDictKeysObject you
load the
correct capacity.

I've been meaning to change "list" to use the same strategy as "dict". I
think that would
simplify the overall design and let "list" shrink the backing array again.

[1]
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/Objects/listobject.c#L46-L49
(
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/46TGB2MXWJ37VUQH3R5LW6BOGLIE3PGG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Larry Hastings



On 10/7/21 8:52 PM, Sam Gross wrote:
I've been working on changes to CPython to allow it to run without the 
global interpreter lock.



Before anybody asks: Sam contacted me privately some time ago to pick my 
brain a little.  But honestly, Sam didn't need any help--he'd already 
taken the project further than I'd ever taken the Gilectomy.  I have 
every confidence in Sam and his work, and I'm excited he's revealed it 
to the world!



Best wishes,


//arry/

___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CCGH6COYQGCAFZWD32ROUOHRSE4BUL3P/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Daniel Pope

On Fri, 8 Oct 2021 at 03:50, Sam Gross wrote:
> My goal with the proof-of-concept is to demonstrate that removing the GIL is
> feasible and worthwhile, and that the technical ideas of the project could
> serve as a basis of such an effort.

I'm a novice C programmer, but I'm unsure about the safety of your
thread-safe collections description. You describe an algorithm for
lock-free read access to list items as

1. Load the version counter from the collection
2. Load the “backing array” from the collection
3. Load the address of the item (from the “backing array”)
4. Increment the reference count of the item, if it is non-zero
(otherwise retry)
5. Verify that the item still exists at the same location in the
collection (otherwise retry)
6. Verify that the version counter did not change (otherwise retry)
7. Return the address of the item

But you do the bounds check for the index before this, here[1]. If the
thread is suspended after this and before you read the address of the
backing array [2], the list could have been resized (shrunk), and the
backing array reallocated from a new memory block. So the pointer you
read at 3 could be from uninitialized memory that is beyond the size
of the array (or within the array but larger than the current number
of items). And then you write to it at 4 which is then a write into a
random memory location.

[1]
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/Objects/listobject.c#L137
[2]
https://github.com/colesbury/nogil/blob/fb6aabede5f7f1936a21c2f48ec7fcc0848d74bf/Objects/listobject.c#L141
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/6J6XFEACF2C6XPLZRVABUFFHJICUTZCS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Pablo Galindo Salgado

>
> To speed-up function calls, the interpreter uses a linear, resizable stack
> to store function call frames, an idea taken from LuaJIT. The stack stores
> the interpreter registers (local variables + space for temporaries) plus
> some extra information per-function call. This avoids the need for
> allocating PyFrameObjects for each call. For compatibility, the
> PyFrameObject type still exists, but they are created lazily as-needed
> (such as for exception handling and for sys._getframe).

The optimized function calls have about an order of magnitude less overhead
> than the current CPython implementation.

The change also simplifies the use of deferred reference counting with the
> data that is stored per-call like the function object. The interpreter can
> usually avoid incrementing the reference count of the function object
> during a call. Like other objects on the stack, a borrowed reference to the
> function is indicated by setting the least-significant-bit.


Congrats Sam! This is incredible work! One quick question after reading the
design doc:

When you mean "an order of magnitude less overhead than the current CPython
implementation" do you mean compared with the main branch? We recently
implemented already almost everything is listed in this paragraph:

https://github.com/python/cpython/pull/27077

We also pack some extra similar optimizations in this other PR, including
stealing the frame arguments from python to python calls:

https://github.com/python/cpython/pull/28488

This could explain why the performance is closer to the current master
branch as you indicate:

 It gets about the same average performance as the “main” branch of CPython
> 3.11 as of early September 2021.


Cheers from cloudy London,
Pablo Galindo Salgado

On Fri, 8 Oct 2021 at 03:49, Sam Gross  wrote:

> Hi,
>
> I've been working on changes to CPython to allow it to run without the
> global interpreter lock. I'd like to share a working proof-of-concept that
> can run without the GIL. The proof-of-concept involves substantial changes
> to CPython internals, but relatively few changes to the C-API. It is
> compatible with many C extensions: extensions must be rebuilt, but usually
> require small or no modifications to source code. I've built compatible
> versions of packages from the scientific Python ecosystem, and they are
> installable through the bundled "pip".
>
> Source code:
> https://github.com/colesbury/nogil
>
> Design overview:
>
> https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit
>
> My goal with the proof-of-concept is to demonstrate that removing the GIL
> is feasible and worthwhile, and that the technical ideas of the project
> could serve as a basis of such an effort.
>
> I'd like to start a discussion about these ideas and gauge the community's
> interest in this approach to removing the GIL.
>
> Regards,
> Sam Gross
> colesb...@gmail.com / sgr...@fb.com
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/ABR2L6BENNA6UPSPKV474HCS4LWT26GY/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/K4SHFGDVAZUMAAKX5ZANAQIVYBPUSLEI/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Nathaniel Smith

On Thu, Oct 7, 2021 at 7:54 PM Sam Gross  wrote:
> Design overview:
> https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit

Whoa, this is impressive work.

I notice the fb.com address -- is this a personal project or something
facebook is working on? what's the relationship to Cinder, if any?

Regarding the tricky lock-free dict/list reads: I guess the more
straightforward approach would be to use a plain ol' mutex that's
optimized for this kind of fine-grained per-object lock with short
critical sections and minimal contention, like WTF::Lock. Did you try
alternatives like that? If so, I assume they didn't work well -- can
you give more details?

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XAZYWRYXKIVUSMRSMFAETKQDLGL27L7X/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Chris Jerdonek

On Fri, Oct 8, 2021 at 8:11 AM Guido van Rossum  wrote:

> To be clear, Sam’s basic approach is a bit slower for single-threaded
> code, and he admits that.
>

Is it also slower even when running with PYTHONGIL=1? If it could be made
the same speed for single-threaded code when running in GIL-enabled mode,
that might be an easier intermediate target while still adding value.

—Chris


But to sweeten the pot he has also applied a bunch of unrelated speedups
> that make it faster in general, so that overall it’s always a win. But
> presumably we could upstream the latter easily, separately from the
> GIL-freeing part.
>
> On Fri, Oct 8, 2021 at 07:42 Łukasz Langa  wrote:
>
>>
>> > On 8 Oct 2021, at 10:13, Steven D'Aprano  wrote:
>> >
>> > Hi Sam,
>> >
>> > On Thu, Oct 07, 2021 at 03:52:56PM -0400, Sam Gross wrote:
>> >
>> >> I've been working on changes to CPython to allow it to run without the
>> >> global interpreter lock. I'd like to share a working proof-of-concept
>> that
>> >> can run without the GIL.
>> >
>> > Getting Python to run without the GIL has never been a major problem for
>> > CPython (and of course some other Python interpreters don't have a GIL
>> > at all).
>>
>> On the first page of Sam's design overview he references Gilectomy by
>> name.
>>
>> > Single threaded code is still, and always will be, an important part of
>> > Python's ecosystem. A lot of people would be annoyed if the cost of
>> > speeding up heavily threaded Python by a small percentage would be to
>> > slow down single-threaded Python by a large percentage.
>>
>> Quoting that same design document, Sam writes: "The new interpreter
>> (together with the GIL changes) is about 10% faster than CPython 3.9
>> on the single-threaded pyperformance benchmarks."
>>
>> - Ł
>> ___
>> Python-Dev mailing list -- python-dev@python.org
>> To unsubscribe send an email to python-dev-le...@python.org
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/python-dev@python.org/message/JO7OQCHZKIFNKSXTTXT2JBCF5H47M7OO/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> --
> --Guido (mobile)
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/XQOOGKH5PIFBHJRK7W2LMX32DIGIH4KX/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/STEMG6WAORYZ2WVMXZZPYSQVEUNNXCSW/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Guido van Rossum

To be clear, Sam’s basic approach is a bit slower for single-threaded code,
and he admits that. But to sweeten the pot he has also applied a bunch of
unrelated speedups that make it faster in general, so that overall it’s
always a win. But presumably we could upstream the latter easily,
separately from the GIL-freeing part.

On Fri, Oct 8, 2021 at 07:42 Łukasz Langa  wrote:

>
> > On 8 Oct 2021, at 10:13, Steven D'Aprano  wrote:
> >
> > Hi Sam,
> >
> > On Thu, Oct 07, 2021 at 03:52:56PM -0400, Sam Gross wrote:
> >
> >> I've been working on changes to CPython to allow it to run without the
> >> global interpreter lock. I'd like to share a working proof-of-concept
> that
> >> can run without the GIL.
> >
> > Getting Python to run without the GIL has never been a major problem for
> > CPython (and of course some other Python interpreters don't have a GIL
> > at all).
>
> On the first page of Sam's design overview he references Gilectomy by name.
>
> > Single threaded code is still, and always will be, an important part of
> > Python's ecosystem. A lot of people would be annoyed if the cost of
> > speeding up heavily threaded Python by a small percentage would be to
> > slow down single-threaded Python by a large percentage.
>
> Quoting that same design document, Sam writes: "The new interpreter
> (together with the GIL changes) is about 10% faster than CPython 3.9
> on the single-threaded pyperformance benchmarks."
>
> - Ł
> ___
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/JO7OQCHZKIFNKSXTTXT2JBCF5H47M7OO/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-- 
--Guido (mobile)
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/XQOOGKH5PIFBHJRK7W2LMX32DIGIH4KX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Łukasz Langa


> On 8 Oct 2021, at 10:13, Steven D'Aprano  wrote:
> 
> Hi Sam,
> 
> On Thu, Oct 07, 2021 at 03:52:56PM -0400, Sam Gross wrote:
> 
>> I've been working on changes to CPython to allow it to run without the
>> global interpreter lock. I'd like to share a working proof-of-concept that
>> can run without the GIL.
> 
> Getting Python to run without the GIL has never been a major problem for
> CPython (and of course some other Python interpreters don't have a GIL
> at all).

On the first page of Sam's design overview he references Gilectomy by name.

> Single threaded code is still, and always will be, an important part of
> Python's ecosystem. A lot of people would be annoyed if the cost of
> speeding up heavily threaded Python by a small percentage would be to
> slow down single-threaded Python by a large percentage.

Quoting that same design document, Sam writes: "The new interpreter
(together with the GIL changes) is about 10% faster than CPython 3.9
on the single-threaded pyperformance benchmarks."

- Ł


signature.asc
Description: Message signed with OpenPGP
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/JO7OQCHZKIFNKSXTTXT2JBCF5H47M7OO/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-08 Thread Steven D'Aprano

Hi Sam,

On Thu, Oct 07, 2021 at 03:52:56PM -0400, Sam Gross wrote:

> I've been working on changes to CPython to allow it to run without the
> global interpreter lock. I'd like to share a working proof-of-concept that
> can run without the GIL.

Getting Python to run without the GIL has never been a major problem for 
CPython (and of course some other Python interpreters don't have a GIL 
at all). I think the first attempt was in 1999, a mere handful of years 
after Python was released.

https://www.artima.com/weblogs/viewpost.jsp?thread=214235

The problem has been removing the GIL without seriously degrading 
performance. How does your GIL-less CPython fork perform? Especially for 
single-threaded code.

Have you been following progress of the GILectomy?

https://pythoncapi.readthedocs.io/gilectomy.html

Single threaded code is still, and always will be, an important part of 
Python's ecosystem. A lot of people would be annoyed if the cost of 
speeding up heavily threaded Python by a small percentage would be to 
slow down single-threaded Python by a large percentage.

-- 
Steve
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/CFPTMVFVPYHCGUPXYVBUYZBPLEVLSFIM/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Python multithreading without the GIL

2021-10-07 Thread Chris Angelico

On Fri, Oct 8, 2021 at 1:51 PM Sam Gross  wrote:
>
> Hi,
>
> I've been working on changes to CPython to allow it to run without the global 
> interpreter lock. I'd like to share a working proof-of-concept that can run 
> without the GIL. The proof-of-concept involves substantial changes to CPython 
> internals, but relatively few changes to the C-API. It is compatible with 
> many C extensions: extensions must be rebuilt, but usually require small or 
> no modifications to source code. I've built compatible versions of packages 
> from the scientific Python ecosystem, and they are installable through the 
> bundled "pip".
>
> Source code:
> https://github.com/colesbury/nogil
>
> Design overview:
> https://docs.google.com/document/d/18CXhDb1ygxg-YXNBJNzfzZsDFosB5e6BfnXLlejd9l0/edit
>
> My goal with the proof-of-concept is to demonstrate that removing the GIL is 
> feasible and worthwhile, and that the technical ideas of the project could 
> serve as a basis of such an effort.
>

Thanks for doing this, and thanks for the detailed writeup! I'd like
to offer a perspective from observing the ongoing project of a brother
of mine; he does not have the concurrency experience that I have, and
it's been instructive to see what he has trouble with. For reference,
the project involves GTK (which only works on the main thread),
multiple threads for I/O (eg a socket read/parse/process thread), and
one thread managed by asyncio using async/await functions.

At no point has he ever had a problem with performance, because the
project is heavily I/O based, spending most of its time waiting for
events. So this is not going to touch on the question of
single-threaded vs multi-threaded performance.

To him, an async function and a thread function are exactly
equivalent. He doesn't think in terms of yield points or anything;
they are simply two ways of doing parallelism and are, to his code,
equivalent.

Mutable shared state is something to get your head around with *any*
sort of parallelism, and nothing will change that. Whether it's
asyncio, GUI callbacks, or actual threads, the considerations have
been exactly the same. Threads neither gain nor lose compared to other
options.

Not being a low-level programmer, he has, I believe, an inherent
assumption that any operation on a built-in type will be atomic. He's
never stated this but I suspect he's assuming that. It's an assumption
that Python is never going to violate.

Concurrency is *hard*. There's no getting around it, there's no
sugar-coating it. There are concepts that simply have to be learned,
and the failures can be extremely hard to track down. Instantiating an
object on the wrong thread can crash GTK, but maybe not immediately.
Failing to sleep in one thread results in other threads stalling. I
don't think any of this is changed by different modes (with the
exception of process-based parallelism, which fixes a lot of
concurrency at the cost of explicit IPC), and the more work
programmers want their code to do, the more likely that they'll run
into this.

Glib.idle_add is really just a magic incantation to make the GUI work. :)

Spawning a thread for asyncio isn't too hard as long as you don't have
to support older Python versions sadly, not every device updated
at the same time. But in a few years, we will be able to ignore Python
versions pre-3.7.

Most likely, none of his code would be affected by the removal of the
GIL, since (as I understand it) the guarantees as seen in Python code
won't change. Will there be impact on lower-memory systems? As small
devices go, the Raspberry Pi is one of the largest, but it's still a
lot smaller than a full PC, and adding overhead to every object would
be costly (I'm not sure what the cost of local reference counting is,
but it can't be none). Threading is perfectly acceptable for a project
like this, so I'm hoping that GIL removal won't unnecessarily penalize
this kind of thread usage.

Speaking of local refcounting, how does that affect objects that get
passed as thread arguments? Initially, an object is owned by the
creating thread, which can relinquish ownership if its local refcount
drops to zero; does another thread then take it over?

I'm excited by anything that helps parallelism in Python, and very
curious to see where this effort will go. If you need a hand with
testing, I'd be happy to help out.

ChrisA
___
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/PGUFHTN4YY4X4OOTCXYPVEJVAWBZZS4K/
Code of Conduct: http://python.org/psf/codeofconduct/

48 matches

Mail list logo