carols10cents opened a new pull request #1887: URL: https://github.com/apache/arrow-datafusion/pull/1887
# Which issue does this PR close? Closes #1832. # Rationale for this change It would be nice for other projects (such as influxdata/influxdb_iox) to be able to serialize DataFusion types as protobuf without needing to depend on Ballista. # What changes are included in this PR? This PR extracts a new crate, datafusion-serialization, that can serialize and deserialize DataFusion types as protocol buffers. Ballista now depends on datafusion-serialization. However, [prost isn't able to provide a way to share/import .proto files between crates](), so `ballista/rust/core/proto/ballista.proto` doesn't have a line that says `import "datafusion.proto"`. This is a problem for other crates that would want to depend on datafusion-serialization too. Some solutions I considered and ruled out: - Have the build.rs file of `ballista` (and any other crate that wants to depend on datafusion-serialization) download `datafusion.proto` from GitHub. This seems fragile and makes building `ballista` dependent on network access. This would also likely introduce mismatched version problems. - Symlink `datafusion-serialization/proto/datafusion.proto` into `ballista/rust/core/proto`, which would work for `ballista` but not for any crate outside of this repo. - Manually copy `datafusion-serialization/proto/datafusion.proto` into `ballista/rust/core/proto`, which is a chore no one wants to do and would probably also cause mismatched version problems. Not liking any of these solutions, I decided the best way is that fields in `ballista` or other crate protos that want to contain `datafusion` types should serialize them as `google::protobuf::Any`, then depend on the `datafusion-serialization` crate to handle the actual interpretation of the bytes as the Rust types. Unfortunately, [prost doesn't have built in support for this](https://github.com/tokio-rs/prost/issues/299), and [the workaround crates I've looked at](https://github.com/tokio-rs/prost/pull/336) seem to provide more functionality than is strictly needed here. So I implemented this using a `TypeUrl` trait and some functions. It's a little messy because I can't use TryFrom/TryInto because of the orphan rule, so a bunch of the ballista from/to proto code needed to be updated. If there are other solutions I haven't thought of, I'd love to hear them! There's also plenty of further refactoring that could be done, but this PR is going to be a big review in any case. # Are there any user-facing changes? I'm not sure how strictly backwards compatibility is considered for ballista.proto-- I changed the types of a bunch of the fields, which isn't backwards compatible. If you'd rather I reserve the existing field names and pick a new name for the fields with the new type, let me know and I'm happy to do so. But as-is, it would be a user-facing change for anyone using the Ballista protobuf definitions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org