[ https://issues.apache.org/jira/browse/KAFKA-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17197368#comment-17197368 ]
Chia-Ping Tsai commented on KAFKA-10485: ---------------------------------------- REQUEST_TIMED_OUT is viewed as fatal error by TransactionManager (except for TxnOffsetCommitHandler). Does it cause trouble on compatibility if we return REQUEST_TIMED_OUT to client? > Use a separate error code for replication related errors > -------------------------------------------------------- > > Key: KAFKA-10485 > URL: https://issues.apache.org/jira/browse/KAFKA-10485 > Project: Kafka > Issue Type: Improvement > Reporter: Guozhang Wang > Priority: Major > > Today when coordinator requests involves an append to the internal topic, > e.g. a commit / sync-group request sent to the group coordinator, we would > capture the following error and translate them as a COORDINATOR_NOT_AVAILABLE > to return to the client: > * UNKNOWN_TOPIC_OR_PARTITION > * NOT_ENOUGH_REPLICAS > * NOT_ENOUGH_REPLICAS_AFTER_APPEND > * REQUEST_TIMED_OUT (for txn coordinator) > Among those, the second / third case worth reconsideration, because a > COORDINATOR_NOT_AVAILABLE would cause the clients trying to re-discover the > coordinator unnecessarily with a short backoff time. The forth case is > probably also worth revisiting: although the motivation of using > COORDINATOR_NOT_AVAILABLE is to let the client retry, it still incurs > unnecessary coordinator re-discovery. > What would be better, is that for 2)/3) clients would not re-discovery the > coordinator, but would just retry with a longer backoff time, and at the same > time expose this either through a metric or through warning logs indicate > that some other brokers, not the coordinator, is unavailable and causing this > operation to be blocked. For 4) clients can just retry without re-discovery. > Only for 1) it makes sense to let the clients to re-discover the coordinator. -- This message was sent by Atlassian Jira (v8.3.4#803005)