[GitHub] [arrow-datafusion] devinjdangelo commented on a diff in pull request #7141: Unify DataFrame and SQL (Insert Into) Write Methods

via GitHub Sun, 30 Jul 2023 09:33:12 -0700


devinjdangelo commented on code in PR #7141:
URL: https://github.com/apache/arrow-datafusion/pull/7141#discussion_r1278589473



##########
datafusion/core/src/datasource/listing/table.rs:
##########
@@ -804,21 +804,25 @@ impl TableProvider for ListingTable {
         .await?;
 
         let file_groups = file_list_stream.try_collect::<Vec<_>>().await?;
-
-        if file_groups.len() > 1 {
-            return Err(DataFusionError::Plan(
-                "Datafusion currently supports tables from single partition 
and/or file."
-                    .to_owned(),
-            ));
+        let writer_mode;
+        //if we are writing a single output_partition to a table backed by a 
single file
+        //we can append to that file. Otherwise, we can write new files into 
the directory
+        //adding new files to the listing table in order to insert to the 
table.
+        let input_partitions = input.output_partitioning().partition_count();
+        if file_groups.len() == 1 && input_partitions == 1 {

Review Comment:
   This logic works but feels inflexible to me. There may be a more explicit 
way for a user to express their intention, similar to Spark's [Save 
Modes](https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#save-modes)
 and `PartitionBy` methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] devinjdangelo commented on a diff in pull request #7141: Unify DataFrame and SQL (Insert Into) Write Methods

Reply via email to