devinjdangelo commented on code in PR #4859:
URL: https://github.com/apache/arrow-rs/pull/4859#discussion_r1338448785
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -347,13 +349,22 @@ impl PageWriter for ArrowPageWriter {
}
}
-/// Encodes a leaf column to [`ArrowPageWriter`]
-enum ArrowColumnWriter {
+/// Serializes [ArrayRef]s to [ArrowColumnChunk]s which can be concatenated
+/// to form a parquet row group
+pub enum ArrowColumnWriter {
ByteArray(GenericColumnWriter<'static, ByteArrayEncoder>),
Column(ColumnWriter<'static>),
}
impl ArrowColumnWriter {
+ /// Serializes an [ArrayRef] to a [ArrowColumnChunk] for an in progress
row group.
+ pub fn write(&mut self, array: ArrayRef, field: Arc<Field>) -> Result<()> {
+ let mut levels = calculate_array_levels(&array, &field)?.into_iter();
+ let mut writer_iter = std::iter::once(self);
Review Comment:
Got it, makes sense. Perhaps we could have something like:
```rust
pub struct ArrowColumnWriter(Vec<ArrowColumnWriterImpl>);
enum ArrowColumnWriterImpl{
...
}
```
which for a non nested column would contain only one ArrowColumnWriterImpl,
but could hold multiple in the case of nested columns?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]