carols10cents opened a new pull request #1887:
URL: https://github.com/apache/arrow-datafusion/pull/1887


   # Which issue does this PR close?
   
   Closes #1832.
   
   # Rationale for this change
   
   It would be nice for other projects (such as influxdata/influxdb_iox) to be 
able to serialize DataFusion types as protobuf without needing to depend on 
Ballista.
   
   # What changes are included in this PR?
   
   This PR extracts a new crate, datafusion-serialization, that can serialize 
and deserialize DataFusion types as protocol buffers.
   
   Ballista now depends on datafusion-serialization. However, [prost isn't able 
to provide a way to share/import .proto files between crates](), so 
`ballista/rust/core/proto/ballista.proto` doesn't have a line that says `import 
"datafusion.proto"`. This is a problem for other crates that would want to 
depend on datafusion-serialization too. Some solutions I considered and ruled 
out:
   
   - Have the build.rs file of `ballista` (and any other crate that wants to 
depend on datafusion-serialization) download `datafusion.proto` from GitHub. 
This seems fragile and makes building `ballista` dependent on network access. 
This would also likely introduce mismatched version problems.
   - Symlink `datafusion-serialization/proto/datafusion.proto` into 
`ballista/rust/core/proto`, which would work for `ballista` but not for any 
crate outside of this repo.
   - Manually copy `datafusion-serialization/proto/datafusion.proto` into 
`ballista/rust/core/proto`, which is a chore no one wants to do and would 
probably also cause mismatched version problems.
   
   Not liking any of these solutions, I decided the best way is that fields in 
`ballista` or other crate protos that want to contain `datafusion` types should 
serialize them as `google::protobuf::Any`, then depend on the 
`datafusion-serialization` crate to handle the actual interpretation of the 
bytes as the Rust types.
   
   Unfortunately, [prost doesn't have built in support for 
this](https://github.com/tokio-rs/prost/issues/299), and [the workaround crates 
I've looked at](https://github.com/tokio-rs/prost/pull/336) seem to provide 
more functionality than is strictly needed here. So I implemented this using a 
`TypeUrl` trait and some functions. It's a little messy because I can't use 
TryFrom/TryInto because of the orphan rule, so a bunch of the ballista from/to 
proto code needed to be updated.
   
   If there are other solutions I haven't thought of, I'd love to hear them!
   
   There's also plenty of further refactoring that could be done, but this PR 
is going to be a big review in any case.
   
   # Are there any user-facing changes?
   
   I'm not sure how strictly backwards compatibility is considered for 
ballista.proto-- I changed the types of a bunch of the fields, which isn't 
backwards compatible. If you'd rather I reserve the existing field names and 
pick a new name for the fields with the new type, let me know and I'm happy to 
do so. But as-is, it would be a user-facing change for anyone using the 
Ballista protobuf definitions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to