Hi Li,

The main concern is ongoing maintenance: who will be able to shoulder the effort, especially as the supporting code will involve non-trivial code generation routines (Numba's "high-level" extension API being not sufficient to cover all use cases)?

Of course, there are also reasons not to be worried too much:

1. Numba's APIs are generally stable, as the project doesn't evolve much (as a matter of fact, while I had left Numba in 2017, I didn't encounter any major surprises when doing this work)

2. I'm an active maintainer in the Arrow monorepo, making it less of a concern than for e.g. Gandiva where the original maintainers were not involved in the rest of the project.

Regards

Antoine.


Le 27/03/2026 à 01:20, Li Jin a écrit :
Hi Antoine,

This is exciting work. I am generally in favor of putting inside PyArrow
for easy of use and ABI reasons above. Can you explain a bit more what are
the downsides of putting in PyArrow vs a separate package?

Li

On Thu, Mar 26, 2026 at 11:08 AM Antoine Pitrou <[email protected]> wrote:


Hello,

Numba (https://numba.pydata.org/) is a Just-in-Time compiler for Python
that allows to speed up scientific calculations written in Python. Out
of the box, Numba supports Numpy arrays (which was the primary target
for its design).

We (at QuantStack) have been investigating the feasibility of supporting
a subset of PyArrow in Numba, so that the fast computation abilities of
Numba can extend to data in the Arrow format.

We have come to the conclusion that supporting a small subset of PyArrow
is definitely doable, at a competitive performance level (between "as
fast as C++" and "4x slower" on a couple preliminary micro-benchmarks).

(by "small subset" we mostly mean: primitive data types, reading and
building arrays)

The Numba integration layer would ideally have to be maintained and
distributed within PyArrow, because of the need to access a number of
Arrow C++ APIs, which don't have a stable ABI (it *might* be possible to
work around this by exporting a dedicated C-like ABI from PyArrow, though).

What we would like to know is how the community feels about putting this
code inside PyArrow, rather than a separate package, for the reason
given above.

This would *not* add a dependency on Numba, since this can be exposed as
a dynamically-loaded extension point:
https://numba.readthedocs.io/en/stable/extending/entrypoints.html

(note: this preliminary investigation was supported by one of our fine
customers)

Regards

Antoine.




Reply via email to