ZENOTME opened a new issue, #34:
URL: https://github.com/apache/iceberg-rust/issues/34

   This issue propose the writer design to solve:
   > arrow: Writing unpartitioned data into iceberg from arrow record batches
   > arrow: Writing partitioned data into iceberg from arrow record batches
   
   And the design is based on what we do in icelake and inspire by java 
iceberg, feel free to any suggestion:
   
   ## Class Design
   
   ### SpecificFormatWriter
   
   At the bottom level, we have kinds of specific format writer, which 
responsible for writing record batch into a file of specific format, such as:
   
   ```
   struct ParquetWriter {
       ...
   }
   
   struct AvroWriter {
       ...
   }
   
   struct OrcWriter {
       ...
   }
   
   /// Implement this trait for above writer
   trait SpecificWriter {
       fn write(batch: &RecordBatch) -> Result<()>
   }
   ```
   
   **1. Disscusion: Which format we prepare to support in v0.2. I guess only 
parquet?**
   
   ### DataFileWriter
   
   A higher level of writer is the data writer, data writer use the 
SpecificWriter and it will split the record batch into multiple file according 
the config such as `file_size_limit`, it looks like:
   
   ```
   struct DataFileWriter {
       current_specific_writer: SpecificWriter
   }
   ```
   
   **2. Disscusion: how do we treat the type SpecificWriter, use enum to 
dispatch or use generic parameter.** 
   
   ### ParititionWriter and UnparitionWriter
   
   The top level is PartitionWriter and UnpartitionWriter. For 
UnpartitionWriter, it is just simlar to the DataFileWriter. For ParitionWriter, 
it need to split the record batch into different group according partition. And 
these record batch will be wrote using DataWriter responsible for different 
partition. It looks like:
   
   ```
   struct PartitionWriter {
       HashMap<Partition,DataFileWriter>    
   } 
   ```
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to