devinjdangelo commented on code in PR #7244:
URL: https://github.com/apache/arrow-datafusion/pull/7244#discussion_r1290007315


##########
datafusion/core/src/datasource/file_format/parquet.rs:
##########
@@ -543,6 +574,172 @@ async fn fetch_statistics(
     Ok(statistics)
 }
 
+/// Implements [`DataSink`] for writing to a parquet file.
+struct ParquetSink {
+    /// Config options for writing data
+    config: FileSinkConfig,
+}
+
+impl Debug for ParquetSink {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.debug_struct("ParquetSink").finish()
+    }
+}
+
+impl DisplayAs for ParquetSink {
+    fn fmt_as(&self, t: DisplayFormatType, f: &mut fmt::Formatter<'_>) -> 
fmt::Result {
+        match t {
+            DisplayFormatType::Default | DisplayFormatType::Verbose => {
+                write!(
+                    f,
+                    "ParquetSink(writer_mode={:?}, file_groups=",
+                    self.config.writer_mode
+                )?;
+                FileGroupDisplay(&self.config.file_groups).fmt_as(t, f)?;
+                write!(f, ")")
+            }
+        }
+    }
+}
+
+impl ParquetSink {
+    fn new(config: FileSinkConfig) -> Self {
+        Self { config }
+    }
+
+    /// Builds a parquet WriterProperties struct, setting options as 
appropriate from TaskContext options
+    fn parquet_writer_props_from_context(
+        &self,
+        context: &Arc<TaskContext>,
+    ) -> WriterProperties {
+        let parquet_context = 
&context.session_config().options().execution.parquet;
+        let mut builder = WriterProperties::builder()
+            .set_created_by(parquet_context.created_by.clone())
+            
.set_data_page_row_count_limit(parquet_context.data_page_row_count_limit)
+            .set_data_page_size_limit(parquet_context.data_pagesize_limit);

Review Comment:
   In this setup, it would also be better UX to push parsing the string up to 
when the user initially sets the config, so the user can get immediate feedback 
rather than a runtime error if they provide an invalid option. E.g.:
   
   ```set execution.parquet.compression = exotic_compression_123;
   PropertyError: unknown or unsupported parquet compression codec: 
exotic_compression_123
   ```
   
   Otherwise, it would seem that the setting is valid and only fail much later 
when a write is attempted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to