Re: [Discuss][Python] Numba support for PyArrow

Dewey Dunnington Sun, 29 Mar 2026 19:07:07 -0700

Cool!

Just wondering: if you used arrow-rs, sparrow, or nanoarrow, could you
develop this as a separate project without adding another component to
PyArrow?


Cheers,

-dewey

On Fri, Mar 27, 2026 at 3:29 AM Raúl Cumplido <[email protected]> wrote:
>
> Hi Vignesh
>
> As per the release schedule concerns that argument doesn't hold up.
> Otherwise we would have to tie our releases to Numpy, Pandas or
> others.
>
> It is just business as usual to test against a set of versions on our
> CI and keep our releases independent from any third party.
>
> Obviously any new feature to the project has a maintenance burden
> associated with it but I am unsure about the "potential dilution of
> pyarrow's core focus as a universal columnar data layer". Enabling
> better support and integrations with the Python scientific computing
> ecosystem has been part of the scope of the project.
>
> And as Antoine mentioned, the integration needs C++ internals without
> a stable ABI, which makes an external package fragile. That's, as far
> as I understand it, the same reason our pandas/NumPy integration lives
> in PyArrow.
>
> Regards,
> Raúl
>
> El vie, 27 mar 2026 a las 4:28, Vignesh Siva
> (<[email protected]>) escribió:
> >
> > Thanks, Li Jin,
> >
> > While integrating the Numba layer directly into PyArrow offers benefits
> > like potentially simpler user experience and direct access to C++ internals
> > without ABI concerns, there are several potential downsides from the
> > perspective of PyArrow's core development and project management. Firstly,
> > it would significantly increase the maintenance burden on the PyArrow
> > development team. This includes not only supporting the Numba integration
> > code itself but also ensuring its compatibility with future Numba and Arrow
> > releases and debugging issues specific to this integration. This could
> > divert resources from PyArrow's core mission and broader development.
> >
> > Secondly, it could lead to an expansion of PyArrow's scope and a potential
> > dilution of its core focus as a universal columnar data layer. Adding
> > highly specialized integrations, even optional ones, can make the project
> > larger and more complex for new contributors to navigate. It also ties the
> > release cycles of Numba-specific features to PyArrow's release schedule,
> > which might not always align. An external package, while facing ABI
> > challenges, allows for more agile development, independent release cycles,
> > and a dedicated community focused solely on the Numba-PyArrow interface,
> > without adding overhead to the main PyArrow project.
> >
> > Regards,
> > Vignesh
> >
> > On Fri, 27 Mar 2026 at 05:51, Li Jin <[email protected]> wrote:
> >
> > > Hi Antoine,
> > >
> > > This is exciting work. I am generally in favor of putting inside PyArrow
> > > for easy of use and ABI reasons above. Can you explain a bit more what are
> > > the downsides of putting in PyArrow vs a separate package?
> > >
> > > Li
> > >
> > > On Thu, Mar 26, 2026 at 11:08 AM Antoine Pitrou <[email protected]>
> > > wrote:
> > >
> > > >
> > > > Hello,
> > > >
> > > > Numba (https://numba.pydata.org/) is a Just-in-Time compiler for Python
> > > > that allows to speed up scientific calculations written in Python. Out
> > > > of the box, Numba supports Numpy arrays (which was the primary target
> > > > for its design).
> > > >
> > > > We (at QuantStack) have been investigating the feasibility of supporting
> > > > a subset of PyArrow in Numba, so that the fast computation abilities of
> > > > Numba can extend to data in the Arrow format.
> > > >
> > > > We have come to the conclusion that supporting a small subset of PyArrow
> > > > is definitely doable, at a competitive performance level (between "as
> > > > fast as C++" and "4x slower" on a couple preliminary micro-benchmarks).
> > > >
> > > > (by "small subset" we mostly mean: primitive data types, reading and
> > > > building arrays)
> > > >
> > > > The Numba integration layer would ideally have to be maintained and
> > > > distributed within PyArrow, because of the need to access a number of
> > > > Arrow C++ APIs, which don't have a stable ABI (it *might* be possible to
> > > > work around this by exporting a dedicated C-like ABI from PyArrow,
> > > though).
> > > >
> > > > What we would like to know is how the community feels about putting this
> > > > code inside PyArrow, rather than a separate package, for the reason
> > > > given above.
> > > >
> > > > This would *not* add a dependency on Numba, since this can be exposed as
> > > > a dynamically-loaded extension point:
> > > > https://numba.readthedocs.io/en/stable/extending/entrypoints.html
> > > >
> > > > (note: this preliminary investigation was supported by one of our fine
> > > > customers)
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > >

Re: [Discuss][Python] Numba support for PyArrow

Reply via email to