alamb commented on code in PR #8162:
URL: https://github.com/apache/arrow-rs/pull/8162#discussion_r2360358117
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -906,6 +918,12 @@ impl ArrowRowGroupWriterFactory {
let writers = get_column_writers(&self.schema, &self.props,
&self.arrow_schema)?;
Ok(ArrowRowGroupWriter::new(writers, &self.arrow_schema))
}
+
+ /// Create column writers for a new row group.
+ pub fn create_column_writers(&self, row_group_index: usize) ->
Result<Vec<ArrowColumnWriter>> {
Review Comment:
So I am not sure making `ArrowRowGroupWriter` public gets us much of
anything, and it would not allow per-column parallel encoding
One benefit of getting the column writers individually, is that then the
columns can be encoded in parallel. The `ArrowRowGroupWriter` can only write
RowGroups in parallel.
I looked at `ArrowRowGroupWriter` a bit more, and the only substantial thing
it does is call a loop with
[compute_leaves](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/fn.compute_leaves.html)
which is already public.
https://github.com/apache/arrow-rs/blob/bac36900826e411564231b89e3eb544ea9082cab/parquet/src/arrow/arrow_writer/mod.rs#L831-L835
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]