Some thoughts: 1. I think it would make sense to start with a design discussion/document about the goals and what we think is implementation specific versus generally applicable. In general, a distributed execution plan seems pretty implementation specific. My sense is that you'd never run a distributed execution plan outside of the knowledge of the particular execution environment it is running within. Part of that is usually distributed execution also includes lifecycle management. For example, if you're going to have work-stealing or early termination in your execution engine, those are operations that stitch into execution coordination (and thus a specific impl). If distributed execution is always engine specific, why try to create a general one for multiple engines? 2. With regards to making Gandiva protos more generic: I'd like to see more clarity on #1. On one hand, extending things so they are reused is good. On the other hand, the more consumers of an interface, the more overloads/non-impls you have for each consumer of it.
On Sat, Jul 20, 2019 at 10:18 AM Andy Grove <andygrov...@gmail.com> wrote: > I recently created a small PoC of distributed query execution on Kubernetes > using the Rust implementation of Apache Arrow and the DataFusion query > engine [1]. > > This PoC uses gRPC to pass query plans to executor nodes and the proto file > [2] is largely based on the Gandiva proto file [3]. The PoC is very basic > but I think it demonstrates the power of having query plans as part of the > proto file. This would allow distributed applications to be built based on > Arrow standards in a way that is not dependent on any particular > implementation of Arrow and would even allow mixing and matching query > engines. > > I wanted to start this discussion to see what the appetite is here for > accepting PRs to add query plan structures to the Gandiva proto file and > also whether we can consider making this an Arrow proto file rather than > being Gandiva-specific, over time. > > Thanks, > > Andy. > > [1] https://github.com/andygrove/ballista > > [2] > > https://github.com/andygrove/ballista/blob/master/proto/ballista/ballista.proto > > [3] > > https://github.com/apache/arrow/blob/master/cpp/src/gandiva/proto/Types.proto >