Re: [PR] Update parse_protobuf_file_scan_config to remove any partition columns from the file_schema in FileScanConfig [arrow-datafusion]

via GitHub Mon, 05 Feb 2024 04:33:21 -0800


alamb commented on code in PR #9126:
URL: https://github.com/apache/arrow-datafusion/pull/9126#discussion_r1478136051



##########
datafusion/proto/src/physical_plan/from_proto.rs:
##########
@@ -560,9 +560,21 @@ pub fn parse_protobuf_file_scan_config(
         output_ordering.push(sort_expr);
     }
 
+    // Currently when the schema for the file is set the partition columns
+    // are present, which is illegal because they aren't actually in the files.
+    // This is a workaround to remove them from the schema.
+    let file_schema = Arc::new(Schema::new(

Review Comment:
   I don't undersrtand why the protobuf code would be doing this filtering -- 
wouldn't this potentially mask an error in whatever was creating the protobuf?
   
   Perhaps it would be better to apply this filtering prior to serializing the 
plan?
   
   If this is needed in the protobuf serialization code, can you please add a 
test to 
   1. Show how this happens and 
   2. Ensure we don't accidentally break this functionality in a future 
refactor?



##########
datafusion/proto/Cargo.toml:
##########
@@ -56,4 +56,4 @@ serde_json = { workspace = true, optional = true }
 [dev-dependencies]
 doc-comment = { workspace = true }
 strum = { version = "0.26.1", features = ["derive"] }
-tokio = "1.18"
+tokio = { version = "1.18", features = ["rt-multi-thread"] }

Review Comment:
   is this change needed?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Update parse_protobuf_file_scan_config to remove any partition columns from the file_schema in FileScanConfig [arrow-datafusion]

Reply via email to