[GitHub] [iceberg] kbendick commented on pull request #4541: Flink: dynamically generate data for upsert UT

GitBox Tue, 12 Apr 2022 02:30:06 -0700


kbendick commented on PR #4541:
URL: https://github.com/apache/iceberg/pull/4541#issuecomment-1096443213


   > @kbendick After diving deeper in Flink, for batch mode jobs, the records 
emitted to `keyBy` node will be sorted. In iceberg, records written to table 
with identifier fields will always be distributed with `keyBy` identifier 
fields. And I believe the sorter is unstable, so the records with same key can 
be swapped.
   > 
   > So here all records will be sorted before being emitted into 
IcebergStreamWriter. And records with same key can be out of order.
   > 
   > So dynamic generated case can break these cases, however, static records 
can always be sorted in the same way i think.
   
   I really appreciate the thorough research.
   
   We (at least I) are hoping to migrate to some of the newer Flink Table API 
after we deprecate 1.12. Hopefully this will make things a bit more 
straightforward. (So using ResolvedSchema and LogicalSchema  vs just Schema, 
things like that).
   
   Im opening a PR to support Flink 1.15 tomorrow (the release candidate 
anyway). There’s one test case somewhat in this realm that isn’t passing. I’ll 
tag you and hopefully if you have time you can take a look? I’d greatly 
appreciate it.
   
   Also feel free to reach out on Slack. Not sure what time zone you’re in but 
I’d love to ask you a few questions.
   
   Thanks again for all the work you’ve put in, especially recently! @yittg 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] kbendick commented on pull request #4541: Flink: dynamically generate data for upsert UT

Reply via email to