Re: [PR] Support Copy To Partitioned Files [arrow-datafusion]

via GitHub Thu, 15 Feb 2024 11:27:43 -0800


comphead commented on code in PR #9240:
URL: https://github.com/apache/arrow-datafusion/pull/9240#discussion_r1491519908



##########
datafusion/core/src/datasource/file_format/write/demux.rs:
##########
@@ -319,14 +319,22 @@ fn compute_partition_keys_by_row<'a>(
 ) -> Result<Vec<Vec<&'a str>>> {
     let mut all_partition_values = vec![];
 
-    for (col, dtype) in partition_by.iter() {
+    // For the purposes of writing partitioned data, we can rely on schema 
inference
+    // to determine the type of the partition cols in order to provide a more 
ergonomic
+    // UI which does not require specifying DataTypes manually. So, we ignore 
the
+    // DataType within the partition_by array and infer the correct type from 
the
+    // batch schema instead.
+    let schema = rb.schema();
+    for (col, _) in partition_by.iter() {
         let mut partition_values = vec![];
+
+        let dtype = schema.field_with_name(col)?.data_type();
         let col_array =
             rb.column_by_name(col)
                 .ok_or(DataFusionError::Execution(format!(
-                    "PartitionBy Column {} does not exist in source data!",
-                    col
-                )))?;
+            "PartitionBy Column {} does not exist in source data! Got schema 
{schema}.",

Review Comment:
   it can be shortened with `exec_datafusion_err!`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support Copy To Partitioned Files [arrow-datafusion]

Reply via email to