houqp commented on pull request #1072: URL: https://github.com/apache/arrow-datafusion/pull/1072#issuecomment-939230833
Thanks @rdettai for the detailed write up! I think going static should be good enough to unblock our development in the short run. But I agree with you that this is just a short term workaround. To make object store truly pluggable, we need to serialize them into unique values, e.g. uri scheme,, stored in generic strings instead of enum in protobuf. This way, if a user compiles in a new object store using a custom crate, they can still get it to work without having to change the ballista protobuf file. In fact, I think we need to do the same thing for table provider as well, hardcoding table providers in protobuf leads to the same restriction. For example, it's not possible to use delta-rs's custom table provider with ballista at the moment. Given that the current logical plan deserialization code only takes serialized protobuf plan as input, I think we would have to go with the lazy two pass deserialization approach proposed by @alamb . Alternatively, we can change the deserialization call in the scheduler to pass in the execution context. This will make it a lot easier to implement more dynamic deserialization logic for both object stores and table providers. I don't see a strong reason why we want to avoid referencing execution context during logical plan deserilization? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
