[ https://issues.apache.org/jira/browse/KAFKA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17150583#comment-17150583 ]
Chia-Ping Tsai commented on KAFKA-10228: ---------------------------------------- It seems the timeout is processed by client local and the error is always defined to Errors.NETWORK_EXCEPTION. {code} private void handleTimedOutRequests(List<ClientResponse> responses, long now) { List<String> nodeIds = this.inFlightRequests.nodesWithTimedOutRequests(now); for (String nodeId : nodeIds) { // close connection to the node this.selector.close(nodeId); log.debug("Disconnecting from node {} due to request timeout.", nodeId); processDisconnection(responses, nodeId, now, ChannelState.LOCAL_CLOSE); } } {code} {code} if (response.wasDisconnected()) { log.trace("Cancelled request with header {} due to node {} being disconnected", requestHeader, response.destination()); for (ProducerBatch batch : batches.values()) completeBatch(batch, new ProduceResponse.PartitionResponse(Errors.NETWORK_EXCEPTION), correlationId, now); {code} Perhaps we can add an new flag, which is similar to "disconnected", to indicate this disconnection is caused by local timeout. > producer: NETWORK_EXCEPTION is thrown instead of a request timeout > ------------------------------------------------------------------ > > Key: KAFKA-10228 > URL: https://issues.apache.org/jira/browse/KAFKA-10228 > Project: Kafka > Issue Type: Improvement > Components: clients > Affects Versions: 2.3.1 > Reporter: Christian Becker > Priority: Major > > We're currently seeing an issue with the java client (producer), when message > producing runs into a timeout. Namely a NETWORK_EXCEPTION is thrown instead > of a timeout exception. > *Situation and relevant code:* > Config > {code:java} > request.timeout.ms: 200 > retries: 3 > acks: all{code} > {code:java} > for (UnpublishedEvent event : unpublishedEvents) { > ListenableFuture<SendResult<String, String>> future; > future = kafkaTemplate.send(new ProducerRecord<>(event.getTopic(), > event.getKafkaKey(), event.getPayload())); > futures.add(future.completable()); > } > CompletableFuture.allOf(futures.stream().toArray(CompletableFuture[]::new)).join();{code} > We're using the KafkaTemplate from SpringBoot here, but it shouldn't matter, > as it's merely a wrapper. There we put in batches of messages to be sent. > 200ms later, we can see the following in the logs: (not sure about the order, > they've arrived in the same ms, so our logging system might not display them > in the right order) > {code:java} > [Producer clientId=producer-1] Received invalid metadata error in produce > request on partition events-6 due to > org.apache.kafka.common.errors.NetworkException: The server disconnected > before a response was received.. Going to request metadata update now > [Producer clientId=producer-1] Got error produce response with correlation id > 3094 on topic-partition events-6, retrying (2 attempts left). Error: > NETWORK_EXCEPTION {code} > There is also a corresponding error on the broker (within a few ms): > {code:java} > Attempting to send response via channel for which there is no open > connection, connection id XXX (kafka.network.Processor) {code} > This was somewhat unexpected and sent us for a hunt across the infrastructure > for possible connection issues, but we've found none. > Side note: In some cases the retries worked and the messages were > successfully produced. > Only after many hours of heavy debugging, we've noticed, that the error might > be related to the low timeout setting. We've removed that setting now, as it > was a remnant from the past and no longer valid for our use-case. However in > order to avoid other people having that issue again and to simplify future > debugging, some form of timeout exception should be thrown. -- This message was sent by Atlassian Jira (v8.3.4#803005)