devinjdangelo commented on code in PR #9240:
URL: https://github.com/apache/arrow-datafusion/pull/9240#discussion_r1491159044
##########
datafusion/proto/src/logical_plan/mod.rs:
##########
@@ -1641,6 +1642,7 @@ impl AsLogicalPlan for LogicalPlanNode {
output_url,
file_format,
copy_options,
+ partition_by: _,
Review Comment:
Note that I did not add support for partition_by in proto. We should add a
follow up ticket for this.
I don't believe this PR will break downstream systems like Ballista's
handling of COPY, but it will silently ignore partition_by options until prost
is updated.
##########
datafusion/core/src/physical_planner.rs:
##########
@@ -585,13 +586,20 @@ impl DefaultPhysicalPlanner {
CopyOptions::WriterOptions(writer_options) =>
*writer_options.clone()
};
+ // Note: the DataType passed here is ignored for the
purposes of writing and inferred instead
Review Comment:
The read path needs an explicit DataType defined for the partition cols so
it knows what to cast to, but I realized that the write path can just infer the
correct DataType from the RecordBatch schema.
This allows COPY to only specify partition columns by name and not have to
worry about specifying the correct data type.
##########
datafusion/sqllogictest/test_files/copy.slt:
##########
@@ -25,6 +25,44 @@ COPY source_table TO 'test_files/scratch/copy/table/'
(format parquet, compressi
----
2
+# Copy to directory as partitioned files
+query IT
+COPY source_table TO 'test_files/scratch/copy/partitioned_table1/' (format
parquet, compression 'zstd(10)', partition_by 'col2');
+----
+2
+
+# validate multiple partitioned parquet file output
+statement ok
+CREATE EXTERNAL TABLE validate_partitioned_parquet STORED AS PARQUET
+LOCATION 'test_files/scratch/copy/partitioned_table1/' PARTITIONED BY (col2);
+
+query I?
+select * from validate_partitioned_parquet;
+----
+2 Bar
+1 Foo
+
+# Copy to directory as partitioned files
+query ITT
+COPY (values (1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z')) TO
'test_files/scratch/copy/partitioned_table2/'
+(format parquet, compression 'zstd(10)', partition_by 'column2, column3');
Review Comment:
Anonymous columns get the name "columnX" based on their order in the VALUES
clause. It would be nice to document this somewhere, though I did make sure it
is relatively easy to discover this based on the error message if you get a
column name wrong.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]