Hi Flink Devs, *Description:* Currently, even when DeliveryGuarantee is set to NONE, the Kafka sink still throws exceptions (for example, due to Kafka cluster issues), which in turn causes the Flink job to restart. This behavior seems counterintuitive for NONEguarantee mode, where the expectation is typically to avoid failures that impact job continuity.
*Proposal:* I would like to propose introducing a new configuration parameter, failOnError, in KafkaSinkBuilder. - When failOnError is set to true (default), the existing behavior remains unchanged—exceptions from the Kafka cluster will cause the job to fail and restart. - When failOnError is set to false, the sink would log the error and continue without throwing an exception, thus avoiding job restarts. This change would give users more control over failure handling, especially in scenarios where the delivery guarantee is already NONE and uninterrupted processing is preferred over strict failure enforcement. I have already created a Jira issue for this: KAFKA-19547 <https://issues.apache.org/jira/browse/KAFKA-19547?filter=-2>, and I am ready to work on the implementation if it is assigned to me. Looking forward to your thoughts and feedback on whether this change would be useful for the broader community. Best regards, Khan