C0urante commented on PR #14372: URL: https://github.com/apache/kafka/pull/14372#issuecomment-2148166741
Thanks for the example and the rationale, @vamossagar12. I agree now that it makes sense to surface these kinds of error by failing connectors/tasks, since they do indicate unhealthy conditions that prevent the connector from functioning properly, and will likely require intervention by users and cluster administrators. But, on the flip side, I still think some kind of retry logic--or at least, not letting the work thread die--is warranted. If someone misconfigures the ACLs on their Kafka cluster and revokes permission for an otherwise-healthy Connect worker to access a source offsets topic, it'd be pretty frustrating to require a restart of every worker in the cluster even if the ACLs were fixed almost immediately. The same can apply with a botched SSL cert update. So failing in-flight callbacks when an unexpected error is encountered during a read to the end of the log--at least for the offsets topic--seems reasonable, but instead of failing the offsets log permanently, we should continue to allow follow-up attempts that may later succeed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org