[GitHub] [iceberg] openinx commented on pull request #1774: [iceberg-1746] Implement spark fanout writer

GitBox Wed, 25 Nov 2020 01:06:40 -0800


openinx commented on pull request #1774:
URL: https://github.com/apache/iceberg/pull/1774#issuecomment-733569256



   I removed the `PartitionedFanoutWriter`  in #1818 because: 
   1.  I found it's easy and more simpler to understand after unifying the 
unpartitioned & partitioned fanout writer in a single 
[RowDataTaskWriter](https://github.com/apache/iceberg/pull/1818/files#diff-137cbe4278e90eab7d4d545be87f5daf929e48a012f1c791ca1e7fc7d7fe5eddR41).
 
   2.  The flink need to parse the `RowKind` to decide whether the row should 
be dispatched to `write` method or `delete` method,  the previous abstraction 
is more suitable for the requirement, So I created an unified task writer for 
flink. 
   
   For spark fanout task writer,  I think it's reasonable for the spark 
streaming scenarios because in that case we don't necessary to shuffle the 
records based on partition keys.   Moving the `PartitionedFanoutWriter` from 
`flink` module to the `core`  module looks good to me. 
   
   @XuQianJin-Stars  Mind to update this PR to address the CI issue ? 
   
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] openinx commented on pull request #1774: [iceberg-1746] Implement spark fanout writer

Reply via email to