[ https://issues.apache.org/jira/browse/KAFKA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764437#comment-17764437 ]
Tom Bentley commented on KAFKA-15459: ------------------------------------- Is this _really_ the best compromise? AFAICS the linked PR and issue don't contain enough information to know what was considered. The loss of specific error codes seems like a big disadvantage to me. Taken to its logical conclusion it would seem we only need a single error code to represent all retriable errors. > Convert coordinator retriable errors to a known producer response error. > ------------------------------------------------------------------------ > > Key: KAFKA-15459 > URL: https://issues.apache.org/jira/browse/KAFKA-15459 > Project: Kafka > Issue Type: Sub-task > Affects Versions: 3.6.0 > Reporter: Justine Olshan > Assignee: Justine Olshan > Priority: Blocker > Fix For: 3.6.0 > > > While reviewing [https://github.com/apache/kafka/pull/14370] I added some of > the documentation for the returned errors in the produce response as well. > There were concerns about the new errors: > * {@link Errors#COORDINATOR_LOAD_IN_PROGRESS} > * {@link Errors#COORDINATOR_NOT_AVAILABLE} > * {@link Errors#INVALID_TXN_STATE} > * {@link Errors#INVALID_PRODUCER_ID_MAPPING} > * {@link Errors#CONCURRENT_TRANSACTIONS} > The coordinator load, not available, and concurrent transactions errors > should be retriable. > The invalid txn state and pid mapping errors should be abortable. > This is how older java clients handle the errors, but it is unclear how other > clients handle them. It seems that rdkafka (for example) treats the abortable > errors as fatal instead. The coordinator errors are retriable but not the > concurrent transactions error. > It seems acceptable for the abortable errors to be fatal on some clients > since the error is likely on a zombie producer or in a state that may be > harder to recover from. However, for the retriable errors, we can return > NOT_ENOUGH_REPLICAS which is a known retriable response. We can use the > produce api's response string to specify the real cause of the error for > debugging. > There were trade-offs between making the older clients work and for clarity > in errors. This seems to be the best compromise. -- This message was sent by Atlassian Jira (v8.20.10#820010)