[
https://issues.apache.org/jira/browse/KAFKA-6582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16878462#comment-16878462
]
Shiran Azran commented on KAFKA-6582:
-------------------------------------
Hello,
We are also having the same issue in version 2.0.1 in the past several months
since the last upgrade.
Is this issue resolved in version 2.1.1?
Thanks.
> Partitions get underreplicated, with a single ISR, and doesn't recover. Other
> brokers do not take over and we need to manually restart the broker.
> --------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-6582
> URL: https://issues.apache.org/jira/browse/KAFKA-6582
> Project: Kafka
> Issue Type: Bug
> Components: network
> Affects Versions: 1.0.0
> Environment: Ubuntu 16.04
> Linux kafka04 4.4.0-109-generic #132-Ubuntu SMP Tue Jan 9 19:52:39 UTC 2018
> x86_64 x86_64 x86_64 GNU/Linux
> java version "9.0.1"
> Java(TM) SE Runtime Environment (build 9.0.1+11)
> Java HotSpot(TM) 64-Bit Server VM (build 9.0.1+11, mixed mode)
> but also tried with the latest JVM 8 before with the same result.
> Reporter: Jurriaan Pruis
> Priority: Major
> Attachments: Screenshot 2019-01-18 at 13.08.17.png, Screenshot
> 2019-01-18 at 13.16.59.png
>
>
> Partitions get underreplicated, with a single ISR, and doesn't recover. Other
> brokers do not take over and we need to manually restart the 'single ISR'
> broker (if you describe the partitions of replicated topic it is clear that
> some partitions are only in sync on this broker).
> This bug resembles KAFKA-4477 a lot, but since that issue is marked as
> resolved this is probably something else but similar.
> We have the same issue (or at least it looks pretty similar) on Kafka 1.0.
> Since upgrading to Kafka 1.0 in November 2017 we've had these issues (we've
> upgraded from Kafka 0.10.2.1).
> This happens almost every 24-48 hours on a random broker. This is why we
> currently have a cronjob which restarts every broker every 24 hours.
> During this issue the ISR shows the following server log:
> {code:java}
> [2018-02-20 12:02:08,342] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.148.20:56352-96708 (kafka.network.Processor)
> [2018-02-20 12:02:08,364] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.150.25:54412-96715 (kafka.network.Processor)
> [2018-02-20 12:02:08,349] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.149.18:35182-96705 (kafka.network.Processor)
> [2018-02-20 12:02:08,379] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.150.25:54456-96717 (kafka.network.Processor)
> [2018-02-20 12:02:08,448] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.159.20:36388-96720 (kafka.network.Processor)
> [2018-02-20 12:02:08,683] WARN Attempting to send response via channel for
> which there is no open connection, connection id
> 10.132.0.32:9092-10.14.157.110:41922-96740 (kafka.network.Processor)
> {code}
> Also on the ISR broker, the controller log shows this:
> {code:java}
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-3-send-thread]:
> Controller 3 connected to 10.132.0.32:9092 (id: 3 rack: null) for sending
> state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,927] INFO [Controller-3-to-broker-0-send-thread]:
> Controller 3 connected to 10.132.0.10:9092 (id: 0 rack: null) for sending
> state change requests (kafka.controller.RequestSendThread)
> [2018-02-20 12:02:14,928] INFO [Controller-3-to-broker-1-send-thread]:
> Controller 3 connected to 10.132.0.12:9092 (id: 1 rack: null) for sending
> state change requests (kafka.controller.RequestSendThread){code}
> And the non-ISR brokers show these kind of errors:
>
> {code:java}
> 2018-02-20 12:02:29,204] WARN [ReplicaFetcher replicaId=1, leaderId=3,
> fetcherId=0] Error in fetch to broker 3, request (type=FetchRequest,
> replicaId=1, maxWait=500, minBytes=1, maxBytes=10485760,
> fetchData={......................}, isolationLevel=READ_UNCOMMITTED)
> (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response was
> read
> at
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:95)
> at
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:205)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:41)
> at
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:149)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:113)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)