JoshElkind opened a new pull request, #20284: URL: https://github.com/apache/datafusion/pull/20284
## Which issue does this PR close? - Closes #20280. ## Rationale for this change Physical plans that read Arrow files (.arrow / IPC) could not be serialized or deserialized via the proto layer. PhysicalPlanNode already had scan nodes for Parquet, CSV, JSON, Avro, and in-memory sources, but not for Arrow, so a DataSourceExec using ArrowSource was not round-trippable. That blocked use cases like distributing plans that scan Arrow files (e.g. Ballista). This change adds Arrow scan to the proto layer so those plans can be serialized and deserialized like the other file formats. ## What changes are included in this PR? Proto: Added ArrowScanExecNode (with FileScanExecConf base_conf) and arrow_scan = 38 to the PhysicalPlanNode oneof in datafusion.proto. Generated code: Updated prost.rs and pbjson.rs to include ArrowScanExecNode and the ArrowScan variant (manual edits; protoc was not run). To-proto: In try_from_data_source_exec, when the data source is a FileScanConfig whose file source is ArrowSource, it is now serialized as ArrowScanExecNode. From-proto: Implemented try_into_arrow_scan_physical_plan to deserialize ArrowScanExecNode into DataSourceExec with ArrowSource; missing base_conf returns an explicit error (no .unwrap()). Test: Added roundtrip_arrow_scan in roundtrip_physical_plan.rs to assert Arrow scan plans round-trip correctly. ## Are these changes tested? Yes. A new test roundtrip_arrow_scan builds a physical plan that scans Arrow files, serializes it to bytes and deserializes it back, and asserts the round-tripped plan matches the original. The full cargo test -p datafusion-proto suite (150 tests: unit, integration, and doc tests) passes, including all existing roundtrip and serialization tests. ## Are there any user-facing changes? No. This only extends the existing physical-plan proto support to Arrow scan. Callers that already serialize/deserialize physical plans (e.g. for distributed execution) can now round-trip plans that read Arrow files in addition to Parquet, CSV, JSON, and Avro, with no API or behavioral changes for existing usage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
