ZENOTME opened a new issue, #34:
URL: https://github.com/apache/iceberg-rust/issues/34
This issue propose the writer design to solve:
> arrow: Writing unpartitioned data into iceberg from arrow record batches
> arrow: Writing partitioned data into iceberg from arrow record batches
And the design is based on what we do in icelake and inspire by java
iceberg, feel free to any suggestion:
## Class Design
### SpecificFormatWriter
At the bottom level, we have kinds of specific format writer, which
responsible for writing record batch into a file of specific format, such as:
```
struct ParquetWriter {
...
}
struct AvroWriter {
...
}
struct OrcWriter {
...
}
/// Implement this trait for above writer
trait SpecificWriter {
fn write(batch: &RecordBatch) -> Result<()>
}
```
**1. Disscusion: Which format we prepare to support in v0.2. I guess only
parquet?**
### DataFileWriter
A higher level of writer is the data writer, data writer use the
SpecificWriter and it will split the record batch into multiple file according
the config such as `file_size_limit`, it looks like:
```
struct DataFileWriter {
current_specific_writer: SpecificWriter
}
```
**2. Disscusion: how do we treat the type SpecificWriter, use enum to
dispatch or use generic parameter.**
### ParititionWriter and UnparitionWriter
The top level is PartitionWriter and UnpartitionWriter. For
UnpartitionWriter, it is just simlar to the DataFileWriter. For ParitionWriter,
it need to split the record batch into different group according partition. And
these record batch will be wrote using DataWriter responsible for different
partition. It looks like:
```
struct PartitionWriter {
HashMap<Partition,DataFileWriter>
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]