C0urante commented on PR #14372:
URL: https://github.com/apache/kafka/pull/14372#issuecomment-2148166741

   Thanks for the example and the rationale, @vamossagar12.
   
   I agree now that it makes sense to surface these kinds of error by failing 
connectors/tasks, since they do indicate unhealthy conditions that prevent the 
connector from functioning properly, and will likely require intervention by 
users and cluster administrators.
   
   But, on the flip side, I still think some kind of retry logic--or at least, 
not letting the work thread die--is warranted. If someone misconfigures the 
ACLs on their Kafka cluster and revokes permission for an otherwise-healthy 
Connect worker to access a source offsets topic, it'd be pretty frustrating to 
require a restart of every worker in the cluster even if the ACLs were fixed 
almost immediately. The same can apply with a botched SSL cert update.
   
   So failing in-flight callbacks when an unexpected error is encountered 
during a read to the end of the log--at least for the offsets topic--seems 
reasonable, but instead of failing the offsets log permanently, we should 
continue to allow follow-up attempts that may later succeed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to