Justine Olshan created KAFKA-15459:
--------------------------------------

             Summary: Convert coordinator retriable errors to a known producer 
response error.
                 Key: KAFKA-15459
                 URL: https://issues.apache.org/jira/browse/KAFKA-15459
             Project: Kafka
          Issue Type: Sub-task
    Affects Versions: 3.6.0
            Reporter: Justine Olshan
            Assignee: Justine Olshan


While reviewing [https://github.com/apache/kafka/pull/14370] I added some of 
the documentation for the returned errors in the produce response as well.

There were concerns about the new errors:
* \{@link Errors#COORDINATOR_LOAD_IN_PROGRESS}
* \{@link Errors#COORDINATOR_NOT_AVAILABLE}
* \{@link Errors#INVALID_TXN_STATE}
* \{@link Errors#INVALID_PRODUCER_ID_MAPPING}
* \{@link Errors#CONCURRENT_TRANSACTIONS}

The coordinator load, not available, and concurrent transactions errors should 
be retriable.

The invalid txn state and pid mapping errors should be abortable.

This is how older java clients handle the errors, but it is unclear how other 
clients handle them. It seems that rdkafka (for example) treats the abortable 
errors as fatal instead. The coordinator errors are retriable but not the 
concurrent transactions error.

It seems acceptable for the abortable errors to be fatal on some clients since 
the error is likely on a zombie producer or in a state that may be harder to 
recover from. However, for the retriable errors, we can return 
NOT_ENOUGH_REPLICAS which is a known retriable response. We can use the produce 
api's response string to specify the real cause of the error for debugging. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to