liurenjie1024 commented on issue #1650:
URL: https://github.com/apache/iceberg-rust/issues/1650#issuecomment-3265264426

   > Hi [@liurenjie1024](https://github.com/liurenjie1024) , thanks for the 
inputs! Your idea sounds good to me and I agree that we should make smaller 
steps if possible. Next I'll try to make a draft based on it!
   > 
   > One thing I'm not too sure about the `PartitioningWriter` interface is 
that the incoming `batch` may still contain rows from different partitions
   > 
   > pub trait PartitioningWriter {
   > // if `batch` here contains data from multiple partitions, 
   > // then the entire batch would still be written to the partition of 
`partition_key`
   >     fn write(&self, partition_key: PartitionKey, batch: RecordBatch);
   > }
   > I'm thinking of something like this:
   > 
   > pub trait PartitioningWriter {
   >     // use record batch splitter to split the incoming batch first
   >     fn write(&self, batch: RecordBatch); 
   >     // the `batch` here should be splitted only
   >     // technically this shouldn't be public accessible
   >     fn do_write(&self, partition_key: PartitionKey, splitted_batch: 
RecordBatch);
   > }
   > Please lmk your thoughts!
   
   Hi, @CTTY The reason we need to put the `partition_key` in the interface is 
that in some frameworks/physical plan, the partition key could be computed in 
other places, not in the writer node. I think we don't need to change 
`DataFileWriter`? The class hierarchy be like:
   ```
   struct TaskWriter {
        datafile_writer: PartitionedWriter
   }
   
   struct ClusterdPartitioningDataWriter {...}
   
   impl PartitioningWriter for ClusterdPartitioningWriter {}
   
   struct FanoutParritioningDataWriter {
      writers: HashMap<Struct, DataFileWriter>
   ....
   }
   
   impl PartitioningWriter for FanoutParritioningWriter {}
   
   ```
   
   I did a little change so that `PartitionedWriter` contains `DataFileWriter` 
rather `FileWriter`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to