Hi all - I'd like to share a library I've been working on for a few months which is built on top of Arrow. It's called quivr <https://github.com/spenczar/quivr> (like a bundle of arrows) and it could be thought of as tools to wrap up PyArrow Tables and extend their capabilities.
I work on scientific software. A lot of the initial scientific work is done in Jupyter notebooks with dataframes. When it's time to build larger production systems on top of that work, the flexibility of dataframes becomes a liability. It's hard to write structured code because dataframes can be so variably typed and permissive. But if you try to use normal tools for this (Python objects, lists, dictionaries), you get crushed with performance issues. I wanted an array-oriented framework, but with a more structured model than any dataframe libraries out there. So, quivr fills that need. You write a *Table* definition, which corresponds closely to a pyarrow Table schema. You do that by writing a Python class, with class attributes signaling the types and names of your columns. And then you can attach methods to describe computation. By using Arrow's struct types, Tables can be composed. You might have a Table which defines a "Location" - and has sophisticated logic for that purpose - and reuse that Location within other, higher-order tables. The compositional approach has really been working extremely well so far in our work. I've written a little blog post <https://journal.spencerwnelson.com/entries/quivr.html> describing the motivations and showing it in use, and docs are up too <https://quivr.readthedocs.io/en/stable/>. quivr is still in a pretty molten state, so I'm very interested in any feedback or broader interest in this from anyone who might find it useful. I'd love to work closer with the Arrow team as well - I have a growing wishlist of features around PyArrow which I'd be interested in working on. Thanks, Spencer