[
https://issues.apache.org/jira/browse/KAFKA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764437#comment-17764437
]
Tom Bentley commented on KAFKA-15459:
-------------------------------------
Is this _really_ the best compromise? AFAICS the linked PR and issue don't
contain enough information to know what was considered.
The loss of specific error codes seems like a big disadvantage to me. Taken to
its logical conclusion it would seem we only need a single error code to
represent all retriable errors.
> Convert coordinator retriable errors to a known producer response error.
> ------------------------------------------------------------------------
>
> Key: KAFKA-15459
> URL: https://issues.apache.org/jira/browse/KAFKA-15459
> Project: Kafka
> Issue Type: Sub-task
> Affects Versions: 3.6.0
> Reporter: Justine Olshan
> Assignee: Justine Olshan
> Priority: Blocker
> Fix For: 3.6.0
>
>
> While reviewing [https://github.com/apache/kafka/pull/14370] I added some of
> the documentation for the returned errors in the produce response as well.
> There were concerns about the new errors:
> * {@link Errors#COORDINATOR_LOAD_IN_PROGRESS}
> * {@link Errors#COORDINATOR_NOT_AVAILABLE}
> * {@link Errors#INVALID_TXN_STATE}
> * {@link Errors#INVALID_PRODUCER_ID_MAPPING}
> * {@link Errors#CONCURRENT_TRANSACTIONS}
> The coordinator load, not available, and concurrent transactions errors
> should be retriable.
> The invalid txn state and pid mapping errors should be abortable.
> This is how older java clients handle the errors, but it is unclear how other
> clients handle them. It seems that rdkafka (for example) treats the abortable
> errors as fatal instead. The coordinator errors are retriable but not the
> concurrent transactions error.
> It seems acceptable for the abortable errors to be fatal on some clients
> since the error is likely on a zombie producer or in a state that may be
> harder to recover from. However, for the retriable errors, we can return
> NOT_ENOUGH_REPLICAS which is a known retriable response. We can use the
> produce api's response string to specify the real cause of the error for
> debugging.
> There were trade-offs between making the older clients work and for clarity
> in errors. This seems to be the best compromise.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)