alamb commented on code in PR #9548:
URL: https://github.com/apache/arrow-datafusion/pull/9548#discussion_r1519644955


##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -540,6 +541,8 @@ async fn fetch_statistics(
 pub struct ParquetSink {
     /// Config options for writing data
     config: FileSinkConfig,
+    /// File metadata from successfully produced parquet files.

Review Comment:
   To use this API in a more general context, we would probably need to define 
how these `FileMetaData` mapped to the original files
   
   Maybe it could be something like this (which would also support the API 
below)
   
   
   ```rust
   HashMap<Path, FileMetaData>
   ```
   
   



##########
datafusion/execution/src/object_store.rs:
##########
@@ -60,6 +60,11 @@ impl ObjectStoreUrl {
     pub fn as_str(&self) -> &str {
         self.as_ref()
     }
+
+    /// Returns as Url
+    pub fn as_url(&self) -> &Url {

Review Comment:
   What is the accessor? `AsRef<str>`?



##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -563,13 +566,22 @@ impl DisplayAs for ParquetSink {
 impl ParquetSink {
     /// Create from config.
     pub fn new(config: FileSinkConfig) -> Self {
-        Self { config }
+        Self {
+            config,
+            written: Default::default(),
+        }
     }
 
     /// Retrieve the inner [`FileSinkConfig`].
     pub fn config(&self) -> &FileSinkConfig {
         &self.config
     }
+
+    /// Retrieve the file metadata from the last write.
+    pub fn written(&self) -> Vec<FileMetaData> {
+        self.written.lock().clone()
+    }

Review Comment:
   I think the comments here are out of date (it returns the FileMetaData 
written for *all* written files, not just the last one)
   
   As mentioned above, without any additional information I don't think there 
is any way to know how to match up the written files and the entries in the 
`Vec<FileMetaData>` without some more information
   
   However, if we changed the API to return a HashMap I think it would work well
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to