I recently created a small PoC of distributed query execution on Kubernetes
using the Rust implementation of Apache Arrow and the DataFusion query
engine [1].

This PoC uses gRPC to pass query plans to executor nodes and the proto file
[2] is largely based on the Gandiva proto file [3]. The PoC is very basic
but I think it demonstrates the power of having query plans as part of the
proto file. This would allow distributed applications to be built based on
Arrow standards in a way that is not dependent on any particular
implementation of Arrow and would even allow mixing and matching query
engines.

I wanted to start this discussion to see what the appetite is here for
accepting PRs to add query plan structures to the Gandiva proto file and
also whether we can consider making this an Arrow proto file rather than
being Gandiva-specific, over time.

Thanks,

Andy.

[1] https://github.com/andygrove/ballista

[2]
https://github.com/andygrove/ballista/blob/master/proto/ballista/ballista.proto

[3]
https://github.com/apache/arrow/blob/master/cpp/src/gandiva/proto/Types.proto

Reply via email to