Hi all - I'd like to share a library I've been working on for a few months
which is built on top of Arrow. It's called quivr
<https://github.com/spenczar/quivr> (like a bundle of arrows) and it could
be thought of as tools to wrap up PyArrow Tables and extend their
capabilities.

I work on scientific software. A lot of the initial scientific work is done
in Jupyter notebooks with dataframes. When it's time to build larger
production systems on top of that work, the flexibility of dataframes
becomes a liability. It's hard to write structured code because dataframes
can be so variably typed and permissive.

But if you try to use normal tools for this (Python objects, lists,
dictionaries), you get crushed with performance issues. I wanted an
array-oriented framework, but with a more structured model than any
dataframe libraries out there.

So, quivr fills that need. You write a *Table* definition, which
corresponds closely to a pyarrow Table schema. You do that by writing a
Python class, with class attributes signaling the types and names of your
columns. And then you can attach methods to describe computation.

By using Arrow's struct types, Tables can be composed. You might have a
Table which defines a "Location" - and has sophisticated logic for that
purpose - and reuse that Location within other, higher-order tables. The
compositional approach has really been working extremely well so far in our
work.

I've written a little blog post
<https://journal.spencerwnelson.com/entries/quivr.html> describing the
motivations and showing it in use, and docs are up too
<https://quivr.readthedocs.io/en/stable/>. quivr is still in a pretty
molten state, so I'm very interested in any feedback or broader interest in
this from anyone who might find it useful. I'd love to work closer with the
Arrow team as well - I have a growing wishlist of features around PyArrow
which I'd be interested in working on.

Thanks,
Spencer

Reply via email to