Some thoughts:

   1. I think it would make sense to start with a design
   discussion/document about the goals and what we think is implementation
   specific versus generally applicable. In general, a distributed execution
   plan seems pretty implementation specific. My sense is that you'd never run
   a distributed execution plan outside of the knowledge of the particular
   execution environment it is running within. Part of that is usually
   distributed execution also includes lifecycle management. For example, if
   you're going to have work-stealing  or early termination in your execution
   engine, those are operations that stitch into execution coordination (and
   thus a specific impl). If distributed execution is always engine specific,
   why try to create a general one for multiple engines?
   2. With regards to making Gandiva protos more generic: I'd like to see
   more clarity on #1. On one hand, extending things so they are reused is
   good. On the other hand, the more consumers of an interface, the more
   overloads/non-impls you have for each consumer of it.


On Sat, Jul 20, 2019 at 10:18 AM Andy Grove <andygrov...@gmail.com> wrote:

> I recently created a small PoC of distributed query execution on Kubernetes
> using the Rust implementation of Apache Arrow and the DataFusion query
> engine [1].
>
> This PoC uses gRPC to pass query plans to executor nodes and the proto file
> [2] is largely based on the Gandiva proto file [3]. The PoC is very basic
> but I think it demonstrates the power of having query plans as part of the
> proto file. This would allow distributed applications to be built based on
> Arrow standards in a way that is not dependent on any particular
> implementation of Arrow and would even allow mixing and matching query
> engines.
>
> I wanted to start this discussion to see what the appetite is here for
> accepting PRs to add query plan structures to the Gandiva proto file and
> also whether we can consider making this an Arrow proto file rather than
> being Gandiva-specific, over time.
>
> Thanks,
>
> Andy.
>
> [1] https://github.com/andygrove/ballista
>
> [2]
>
> https://github.com/andygrove/ballista/blob/master/proto/ballista/ballista.proto
>
> [3]
>
> https://github.com/apache/arrow/blob/master/cpp/src/gandiva/proto/Types.proto
>

Reply via email to