[ 
https://issues.apache.org/jira/browse/KAFKA-15459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764437#comment-17764437
 ] 

Tom Bentley commented on KAFKA-15459:
-------------------------------------

Is this _really_ the best compromise? AFAICS the linked PR and issue don't 
contain enough information to know what was considered.

The loss of specific error codes seems like a big disadvantage to me. Taken to 
its logical conclusion it would seem we only need a single error code to 
represent all retriable errors.

 

 

 

> Convert coordinator retriable errors to a known producer response error.
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-15459
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15459
>             Project: Kafka
>          Issue Type: Sub-task
>    Affects Versions: 3.6.0
>            Reporter: Justine Olshan
>            Assignee: Justine Olshan
>            Priority: Blocker
>             Fix For: 3.6.0
>
>
> While reviewing [https://github.com/apache/kafka/pull/14370] I added some of 
> the documentation for the returned errors in the produce response as well.
> There were concerns about the new errors:
>  * {@link Errors#COORDINATOR_LOAD_IN_PROGRESS}
>  * {@link Errors#COORDINATOR_NOT_AVAILABLE}
>  * {@link Errors#INVALID_TXN_STATE}
>  * {@link Errors#INVALID_PRODUCER_ID_MAPPING}
>  * {@link Errors#CONCURRENT_TRANSACTIONS}
> The coordinator load, not available, and concurrent transactions errors 
> should be retriable.
> The invalid txn state and pid mapping errors should be abortable.
> This is how older java clients handle the errors, but it is unclear how other 
> clients handle them. It seems that rdkafka (for example) treats the abortable 
> errors as fatal instead. The coordinator errors are retriable but not the 
> concurrent transactions error.
> It seems acceptable for the abortable errors to be fatal on some clients 
> since the error is likely on a zombie producer or in a state that may be 
> harder to recover from. However, for the retriable errors, we can return 
> NOT_ENOUGH_REPLICAS which is a known retriable response. We can use the 
> produce api's response string to specify the real cause of the error for 
> debugging. 
> There were trade-offs between making the older clients work and for clarity 
> in errors. This seems to be the best compromise.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to