elkhand commented on issue #2033: URL: https://github.com/apache/iceberg/issues/2033#issuecomment-759036441
@openinx Thanks for clarifying the Flink specific manifest file - being different from Iceberg manifest file. **Correction on the issue based on observations:** - this is not related to S3 bucket versioning. - this is not related to the S3 bucket policy setup. Unfortunately, there is nothing in logs, and no failover has happened, Job continues running normally, and producing data files and Flink specific manifest files. But it is not creating Iceberg manifest file, manifest list file, or metadata file. I'm still investigating the root cause on and off for this Flink job, will share the finding on this issue. **Few characteristics of this job:** - the Flink job consists of 50 disconnected graphs, each graph is consuming from a single Kafka topic and ingesting into a separate Iceberg table. - Checkpointing frequency is every hour. - Each Iceberg table has date/hour partitioning. **The issue can be reproduced** The issue (having only Flink specific manifest files, and missing Iceberg specific files) occurs : - after the job completes several checkpoints successfully - job is suspended (via checkpoint) few times and started again - Before the job is shut down and savepoint is completed (during the suspension flow), Flink will produce Iceberg specific files that were missing for the last checkpoints and remove Flink specific files. Thanks. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
