[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7141: Unify DataFrame and SQL (Insert Into) Write Methods

via GitHub Tue, 01 Aug 2023 14:02:56 -0700


alamb commented on code in PR #7141:
URL: https://github.com/apache/arrow-datafusion/pull/7141#discussion_r1281150280



##########
datafusion/core/src/datasource/physical_plan/mod.rs:
##########
@@ -330,6 +330,8 @@ pub struct FileSinkConfig {
     pub object_store_url: ObjectStoreUrl,
     /// A vector of [`PartitionedFile`] structs, each representing a file 
partition
     pub file_groups: Vec<PartitionedFile>,
+    /// Vector of partition paths
+    pub table_paths: Vec<ListingTableUrl>,

Review Comment:
   I am not sure -- the notion of tables that are modifed by DataFusion wasn't 
really in the initial design -- which was focused on read. 
   
   @metesynnada  has started the process to add the ability to write to some 
tables, but I think there is still a ways to go.
   
   Ideally in my mind most of the code to write data to a sink (CSV, JSON, etc) 
would not be tied to a TableProvider. Each table provider could provide some 
adapter that made the appropriate sink for updating the state if that made sense
   
   Thus I would suggest focusing on how to make writing to a sink unconnected 
to a table provider working well, and then we'll go wire that API up into the 
relevant TableProviders if desired



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #7141: Unify DataFrame and SQL (Insert Into) Write Methods

Reply via email to