alamb commented on code in PR #8162:
URL: https://github.com/apache/arrow-rs/pull/8162#discussion_r2360358117


##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -906,6 +918,12 @@ impl ArrowRowGroupWriterFactory {
         let writers = get_column_writers(&self.schema, &self.props, 
&self.arrow_schema)?;
         Ok(ArrowRowGroupWriter::new(writers, &self.arrow_schema))
     }
+
+    /// Create column writers for a new row group.
+    pub fn create_column_writers(&self, row_group_index: usize) -> 
Result<Vec<ArrowColumnWriter>> {

Review Comment:
   So I am not sure making `ArrowRowGroupWriter` public gets us much of 
anything, and it would not allow per-column parallel encoding
   
   One benefit of getting the column writers individually, is that then the 
columns can be encoded in parallel. The `ArrowRowGroupWriter` can only write 
RowGroups in parallel. 
   
   I looked at `ArrowRowGroupWriter` a bit more, and the only substantial thing 
it  does is call a loop with 
[compute_leaves](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/fn.compute_leaves.html)
 which is already public. 
   
   
   
https://github.com/apache/arrow-rs/blob/bac36900826e411564231b89e3eb544ea9082cab/parquet/src/arrow/arrow_writer/mod.rs#L831-L835



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to