stevenzwu commented on issue #2918: URL: https://github.com/apache/iceberg/issues/2918#issuecomment-903407782
@Reo-LEI Thanks a lot for the detailed explanations. I got the problem and motivation for this change. > Case2: x != y == z This PR change the parallelism of of the upstream operator to force x=y. To me, it is dangerous for FlinkSink to modify sth that doesn't own. It violates the principle of ownership/isolation. I would argue that for the CDC upsert case, job parallelism should be set to the CDC source parallelism (x). It doesn't make sense to have a different job parallelism (y) than CDC source parallelism (x). and then have the FlinkSink to do some magic to override the parallelism of an upstream operator that it doesn't own. > Case3: x == y != z This can happen if we need an higher parallelism for the Flink writer. I can see that we may want to handle this case in the FlinkSink. I am wondering if we should add a new `equalityKeysHash` to the `DistributionMode`. However, I am personally not sure how valuable this setup will be. This assumes that Flink writers are the bottleneck and the job throughput can improve significantly with higher Iceberg writer parallelism. > Case4: x != y != z it 's a combination of 2 and 3. so individual arguments apply separately here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
