[Numpy-discussion] Improving the Thread Safety of NumPy's Test Suite

Britney Whittington Thu, 18 Sep 2025 14:12:12 -0700

Hello! My name is Britney Whittington, and I'm an intern at Quansight working 
with Nathan Goldbaum. This is my first time working with an open-source project 
like NumPy, and I'm excited to be here!

For my project, I am working on making the NumPy test suite more thread safe,
to improve its support of Python's free-threaded build of Python 3.14. To do
this, I have been using `pytest-run-parallel` [1]. You can see the README of
the plugin for more details, but briefly it runs each test in a test suite many
times in a thread pool. This exercise results in failures in the test suite,
often due to thread-safety issues in pytest itself, or thread-safety issues
with NumPy due to use of global state in the NumPy implementation. As we work
on making the test suite more thread safe, we have exposed thread safety issues
in NumPy.

Party of the difficulty of using `pytest-run-parallel` is that Pytest itself
and the constructs provided by it are not thread safe [2], so it takes some
effort to make a test suite as big as NumPy's run under `pytest-run-parallel`
without any failures.

We've already merged some work along these lines. You may have seen some PRs
I've submitted so far related to this project [3]. We are getting close to
turning on `pytest-run-parallel` in CI. Before we do that, we want to make sure
there aren't any objections.

So far, I've made two major changes to the test suite:

----------------------------------------------------
1. Refactoring xunit Setup and Teardown Methods
----------------------------------------------------
#### Problem
- NumPy makes heavy use of pytest's xunit setup and teardown methods [4], which
are mentioned in the testing guidelines [5].
- When using `pytest-run-parallel`, pytest will try to run the teardown before
all threads complete running a test.

#### Solutions
- Replace setup/teardown with fixtures. While fixtures play more nicely with
threads, they are still shared between threads. If one thread mutates a
fixture, this mutation will carry over to all the other threads. Additionally,
dependency injection via a fixture is fundamentally *hard to debug and
understand at a glance*.
- Use **explicit setup**. Instead of pytest calling setup methods, we modify
the setup methods to be manually called in each test. This fixes the teardown
issue, and allows us to declare variables locally and not worry about mutations
between threads.

Currently we are favoring the usage of explicit setup. Of course, this may not
work for every xunit setup. For more complex cases, fixtures may be more
useful, or context managers.

-------------------------------------------
2. Refactoring Global np.random Calls
-------------------------------------------
#### Problem
- Calls to `np.random` use the same global instance. This results in errors
with tests that heavily rely on seeded results, due to threads sharing the same
global RNG state. While this may not necessarily cause failures, we also feel
that it's a fundamentally bad practice for tests to rely on global state like
this.

#### Solution
- Instead of `np.random`, each test should use a local instance of
`np.random.RandomState`, so that threads can increment through their own local
RNG stream.

I have made this change to the tests that fail under `pytest-run-parallel` [7],
and am working on making all test calls to `np.random` local.
Note that `RandomState` uses the same MT RNG that the global RNG uses, so the
RNG streams are the same as before.

===============================

-----------------------------------------
> What Does This Mean Going Forward?
-----------------------------------------
By refactoring the test suite to be more thread safe, if we'd like to add
`pytest-run-parallel` CI, contributors may need to write tests in a somewhat
different style.

It is possible for `pytest-run-parallel` to fix some of these issues on its
side, such as making it so xunit setup runs properly. However, with the current
state of pytest and the plugin, this will require a lot of work, time, and
maintenance. It also may make it more difficult to improve the thread safety of
pytest itself in the future. See this issue [8] for further discussion. We
think that a refactor of the NumPy test suite is more straightforward for now,
and can always be reverted once `pytest-run-parallel` develops a way to handle
thread-unsafe setup fixtures.

--------------------
> Testing Guidelines
--------------------
In addition to make the tests thread safe, I'd like to update the testing
guidelines [9]. Some things can be clarified (such as NumPy's opinion on
fixture usage) and it would be good to update it with current best practices,
and, if folks are open to it, guidelines on writing thread-safe tests.

[1] https://github.com/Quansight-Labs/pytest-run-parallel
[2] https://docs.pytest.org/en/stable/explanation/flaky.html#thread-safety
[3] https://github.com/numpy/numpy/issues/29552 (mentioned by all PRs related
to this project)
[4] https://docs.pytest.org/en/stable/how-to/xunit_setup.html
[5]
https://numpy.org/doc/stable/reference/testing.html#easier-setup-and-teardown-functions-methods
[6] https://docs.pytest.org/en/stable/explanation/fixtures.html
[7] https://github.com/numpy/numpy/pull/29729
[8] https://github.com/Quansight-Labs/pytest-run-parallel/issues/14
[9] https://numpy.org/doc/stable/reference/testing.html
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: [email protected]

[Numpy-discussion] Improving the Thread Safety of NumPy's Test Suite

Reply via email to