hi folks, I jotted down some high level ideas about directions I'd like to push the various parts of the project on the C++ side along with the language bindings in Python, R, Ruby, and others. Many people may know that I am building a not-for-profit open source development team to focus on Apache Arrow (https://ursalabs.org/), so this document is partly for my colleagues to organize some lower-level technical discussions and planning in the Arrow JIRA. I'm interested from feedback from the whole Arrow community, and we obviously would love to have as many people as possible involved who have an interest in the C++ libraries and their bindings.
The simplified summary is that I would like to work toward an embeddable in-memory query engine in C++ that can be used in all the bindings. This can be used in numerous contexts, from data frame libraries to streaming data transformation. As a simple example, we could compile filter expressions with Gandiva and apply these to a stream of record batches being materialized from a directory of Parquet files. There's a lot of pieces that still have to fall into place to do this in a sustainable and non-hacky way. https://docs.google.com/document/d/12dWBniKW2JQ-5djE3SPjyQXVquCAEmLXVlb1dnhLhQ0/edit#heading=h.62rx18p423rw Looking forward to the feedback of others! Thanks Wes