I think the detail that was missing is you need to add the `#cython: fast_gil = True` to enable it.

For me:
Python 3.9 and 3.10 are basically identical (on master)

**test_gil_already_held**
with fast_gil
Running the test...
took 0.175062894821167
without
Running the test...
took 0.10976791381835938

**test_gil_released**
with fast_gil
Running the test...
took 0.583066463470459
without
Running the test...
took 0.5824759006500244

test_gil_already_held is noticably faster with fast_gil.

For Python 3.11:
I get the crash in 0.29.x if I try to run using fast_gil. No defines are needed to get that...
On master:

**test_gil_already_held**
with fast_gil
Running the test...
took 0.17254948616027832
without
Running the test...
took 0.10958600044250488

**test_gil_released**
with fast_gil
Running the test...
took 0.5791811943054199
without
Running the test...
took 0.5597968101501465

Note that "without fastgil" is now as fast as "fastgil" used to be. As fastgil is now slower. This is reproducible.

On Python 3.12 on master they're identical by default (which makes sense since I think we disable it). Defining -DCYTHON_FAST_GIL brings us back to roughly the same as 3.11 (i.e. now slower).

So my conclusion is that from 3.11 onwards Python sped up their own GIL handling to about the same as we used to have, and fastgil has turned into a pessimization.

David




On 19/09/2023 11:58, Lisandro Dalcin wrote:
Disclaimer: I may be doing something wrong, I did not put a lot of effort into it. With the microbenchmark that was offered in the GH issue, I see little difference.
Use the attached zip file to reproduce yourself.
Change tox.ini to "cython<3" to try 0.29.x.
BTW, in the 0.29.x case, I see no compilation error as claimed in the GH issue.

$ ./run.sh
CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=0
Running test_gil_already_held ... took 0.08735537528991699
Running test_gil_released     ... took 0.6329536437988281
py37: OK ✔ in 3.57 seconds
Running test_gil_already_held ... took 0.09007453918457031
Running test_gil_released     ... took 0.4598276615142822
py38: OK ✔ in 3.19 seconds
Running test_gil_already_held ... took 0.10935306549072266
Running test_gil_released     ... took 0.4512367248535156
py39: OK ✔ in 3.25 seconds
Running test_gil_already_held ... took 0.09970474243164062
Running test_gil_released     ... took 0.46637773513793945
py310: OK ✔ in 3.21 seconds
Running test_gil_already_held ... took 0.08569073677062988
Running test_gil_released     ... took 0.46811795234680176
py311: OK ✔ in 3.22 seconds
Running test_gil_already_held ... took 0.15221118927001953
Running test_gil_released     ... took 0.2246694564819336
  py37: OK (3.57 seconds)
  py38: OK (3.19 seconds)
  py39: OK (3.25 seconds)
  py310: OK (3.21 seconds)
  py311: OK (3.22 seconds)
  pypy3.9: OK (5.24 seconds)
  congratulations :) (21.71 seconds)
CFLAGS=-g0 -Ofast -DCYTHON_FAST_GIL=1
Running test_gil_already_held ... took 0.08835673332214355
Running test_gil_released     ... took 0.6265637874603271
py37: OK ✔ in 1.42 seconds
Running test_gil_already_held ... took 0.09030938148498535
Running test_gil_released     ... took 0.456279993057251
py38: OK ✔ in 1.17 seconds
Running test_gil_already_held ... took 0.10986089706420898
Running test_gil_released     ... took 0.45894527435302734
py39: OK ✔ in 1.2 seconds
Running test_gil_already_held ... took 0.10107588768005371
Running test_gil_released     ... took 0.5052204132080078
py310: OK ✔ in 1.21 seconds
Running test_gil_already_held ... took 0.08566665649414062
Running test_gil_released     ... took 0.4581136703491211
py311: OK ✔ in 1.13 seconds
Running test_gil_already_held ... took 0.15286779403686523
Running test_gil_released     ... took 0.22533607482910156
  py37: OK (1.42 seconds)
  py38: OK (1.17 seconds)
  py39: OK (1.20 seconds)
  py310: OK (1.21 seconds)
  py311: OK (1.13 seconds)
  pypy3.9: OK (1.64 seconds)
  congratulations :) (7.81 seconds)


On Tue, 19 Sept 2023 at 10:09, Stefan Behnel <stefan...@behnel.de> wrote:

    Hi,

    I've seen reports that Cython's "FastGIL" implementation (which
    basically
    keeps the GIL state in a thread-local variable) is no longer
    faster than
    CPython's plain GIL implementation in recent Python 3.x versions.
    Potentially even slower. See the report in

    https://github.com/cython/cython/issues/5703

    It would be helpful to get user feedback on this.

    If you have GIL-heavy Cython code, especially with nested
    with-nogil/with-gil sections across functions, and a benchmark that
    exercises it, could you please run the benchmark with and without the
    feature enabled and report the results?

    You can add "-DCYTHON_FAST_GIL=0" to your CFLAGS to disabled it
    (and "=1"
    to enable it explicitly). It's enabled by default in CPython
    3.6-3.11 (but
    disabled in Cython 0.29.x on Python 3.11).

    Thanks,
    Stefan
    _______________________________________________
    cython-devel mailing list
    cython-devel@python.org
    https://mail.python.org/mailman/listinfo/cython-devel



--
Lisandro Dalcin
============
Senior Research Scientist
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

_______________________________________________
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel

_______________________________________________
cython-devel mailing list
cython-devel@python.org
https://mail.python.org/mailman/listinfo/cython-devel

Reply via email to