[Discuss][Python] Numba support for PyArrow

Antoine Pitrou Thu, 26 Mar 2026 08:08:11 -0700


Hello,

Numba (https://numba.pydata.org/) is a Just-in-Time compiler for Pythonthat allows to speed up scientific calculations written in Python. Outof the box, Numba supports Numpy arrays (which was the primary targetfor its design).

We (at QuantStack) have been investigating the feasibility of supportinga subset of PyArrow in Numba, so that the fast computation abilities ofNumba can extend to data in the Arrow format.

We have come to the conclusion that supporting a small subset of PyArrowis definitely doable, at a competitive performance level (between "asfast as C++" and "4x slower" on a couple preliminary micro-benchmarks).

(by "small subset" we mostly mean: primitive data types, reading andbuilding arrays)

The Numba integration layer would ideally have to be maintained anddistributed within PyArrow, because of the need to access a number ofArrow C++ APIs, which don't have a stable ABI (it *might* be possible towork around this by exporting a dedicated C-like ABI from PyArrow, though).

What we would like to know is how the community feels about putting thiscode inside PyArrow, rather than a separate package, for the reasongiven above.

This would *not* add a dependency on Numba, since this can be exposed asa dynamically-loaded extension point:

https://numba.readthedocs.io/en/stable/extending/entrypoints.html

(note: this preliminary investigation was supported by one of our finecustomers)


Regards

Antoine.

[Discuss][Python] Numba support for PyArrow

Reply via email to