alamb opened a new issue, #4823: URL: https://github.com/apache/arrow-rs/issues/4823
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In DataFusion, @devinjdangelo is using the [`append_column`](https://docs.rs/parquet/latest/parquet/file/writer/struct.SerializedRowGroupWriter.html#method.append_column) API to write parquet files in parallel (https://github.com/apache/arrow-datafusion/pull/7562) However, when trying to copy the `RowGroupMetadata` to the API to copy any bloom filters / page offsets, or others is awkward **Describe the solution you'd like** I would like a way to to call the `append_column` api given a [`RowGroupMetaData`](https://docs.rs/parquet/latest/parquet/file/metadata/struct.RowGroupMetaData.html) object from the existing file Ideally there would be an API that produced a [`ColumnCloseResult`](https://docs.rs/parquet/latest/parquet/column/writer/struct.ColumnCloseResult.html) from a `RowGroupMetaData` or some convenience API that took a reader + RowGroupMetadata from another file and did the necessary copy Perhaps something like ```rust impl SerializedRowGroupWriter { ... /// appends an entire RowGroup from the specified reader, including all /// metadata, to the in progress parquet file. pub fn append_row_group(&mut self, rg: Box<dyn RowGroupReader>) -> Result<...> { ... } } ``` https://docs.rs/parquet/latest/parquet/file/writer/struct.SerializedRowGroupWriter.html#method.append_column **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
