kbendick commented on PR #4541: URL: https://github.com/apache/iceberg/pull/4541#issuecomment-1096443213
> @kbendick After diving deeper in Flink, for batch mode jobs, the records emitted to `keyBy` node will be sorted. In iceberg, records written to table with identifier fields will always be distributed with `keyBy` identifier fields. And I believe the sorter is unstable, so the records with same key can be swapped. > > So here all records will be sorted before being emitted into IcebergStreamWriter. And records with same key can be out of order. > > So dynamic generated case can break these cases, however, static records can always be sorted in the same way i think. I really appreciate the thorough research. We (at least I) are hoping to migrate to some of the newer Flink Table API after we deprecate 1.12. Hopefully this will make things a bit more straightforward. (So using ResolvedSchema and LogicalSchema vs just Schema, things like that). Im opening a PR to support Flink 1.15 tomorrow (the release candidate anyway). There’s one test case somewhat in this realm that isn’t passing. I’ll tag you and hopefully if you have time you can take a look? I’d greatly appreciate it. Also feel free to reach out on Slack. Not sure what time zone you’re in but I’d love to ask you a few questions. Thanks again for all the work you’ve put in, especially recently! @yittg -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
