stevenzwu commented on issue #2918:
URL: https://github.com/apache/iceberg/issues/2918#issuecomment-903407782


   @Reo-LEI Thanks a lot for the detailed explanations. I got the problem and 
motivation for this change. 
   
   > Case2: x != y == z
   
   This PR change the parallelism of of the upstream operator to force x=y. To 
me, it is dangerous for FlinkSink to modify sth that doesn't own. It violates 
the principle of ownership/isolation. I would argue that for the CDC upsert 
case, job parallelism should be set to the CDC source parallelism (x). It 
doesn't make sense to have a different job parallelism (y) than CDC source 
parallelism (x). and then have the FlinkSink to do some magic to override the 
parallelism of an upstream operator that it doesn't own.
   
   > Case3: x == y != z
   
   This can happen if we need an higher parallelism for the Flink writer. I can 
see that we may want to handle this case in the FlinkSink. I am wondering if we 
should add a new `equalityKeysHash`  to the `DistributionMode`.
   
   However, I am personally not sure how valuable this setup will be. This 
assumes that Flink writers are the bottleneck and the job throughput can 
improve significantly with higher Iceberg writer parallelism.
   
   > Case4: x != y != z
   
   it 's a combination of 2 and 3. so individual arguments apply separately 
here.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to