pvary opened a new pull request, #7571: URL: https://github.com/apache/iceberg/pull/7571
We found that if the current implementation of the IcebergSource faces some downstream error then it silently retries the planning again and again until the error persists. The only effect on the job is that there is no new record emitted - but no other alarms are raised. This is similar how the Flink FileSource works but this is confusing for the users. Also other sources might implement error handling differently, for example Kafka which fails immediately on an error. This is also not desirable for our user base because we expect more resiliency for our jobs. Based on our discussion with @zhen-wu2 and @gyula-fora, I have created this PR which adds the possibility to retry the failed planning and to configure the number of retries in a few different ways: - `IcebergSource.Builder.planRetryNum` - if the source is created from java code - Using FlinkConfiguration key `connector.iceberg.plan-retry-num` - Through read option `plan-retry-num` The default value is 3, which means that if the 4th planning is failed then the Flink job is failed. If the original behaviour is needed, then the value `-1` should be used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
