Hi there,

This sounds similar to https://issues.apache.org/jira/browse/KAFKA-4477.
Have you tried 0.10.1.1?

-Jason

On Fri, Jan 20, 2017 at 5:27 PM, Hui Yang <huiy...@expedia.com> wrote:

> Hi, Kafka Team
>
> This is Hui Yang from Expedia engineer team and want to ask a question
> about Kafka 10 issue.
> Our team use Kafka as our core infrastructure and recently upgrade from
> Kafka 0.8.2.2 to Kafka 0.10.1.0 but get a issue after the upgrade.
>
> The issue is as below:
> Kafka 10 works well after the upgrade for couple days but then we started
> to see "java.io.IOException: Connection to 3 was disconnected before the
> response was read” on each Kafka broker when trying to communicate to
> controller (as you may know, one of the Kafka broker is acting as a
> controller to handle the topic/partition assignment and state change task,
> in our case, it is the broker 3).
> Even on the controller log, I found "[Controller-3-to-broker-3-send-thread],
> Controller 3 epoch 3 fails to send request,java.io.IOException: Connection
> to 3 was disconnected before the response was read”, looks it is even not
> able to sent message to itself.
> After we saw those exception on brokers for a while, we started to see
> timeout exception from our producer side that our producer is not able to
> send messages to brokers.
>
> When I check the JMX metrics, I found the CPU usage for controller is
> always higher than other brokers after we upgrade to Kafka 10(brokers have
> similar CPU usage when Kafka 8) and memory increased for a spike
> specifically for the controller during the issue. I assume the controller
> may not have enough memory left to create new connections for the producer
> and other brokers.
>
> One more need to mention is we use the Kafka 0.8 protocol and format on
> Kafka 0.10 brokers that we can still use 0.8 clients.
>
> Details for the exception:
> " WARN [ReplicaFetcherThread-0-3], Error in fetch kafka.server.
> ReplicaFetcherThread$FetchRequest@87d8e00 (kafka.server.
> ReplicaFetcherThread)
> java.io.IOException: Connection to 3 was disconnected before the response
> was read
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:257)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at kafka.server.ReplicaFetcherThread.sendRequest(
> ReplicaFetcherThread.scala:253)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> at kafka.server.AbstractFetcherThread.processFetchRequest(
> AbstractFetcherThread.scala:118)
> at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:
> 103)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)"
>
> "WARN [Controller-3-to-broker-3-send-thread], Controller 3 epoch 1 fails
> to send request
> java.io.IOException: Connection to 2 was disconnected before the response
> was read
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:115)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.scala:112)
> at scala.Option.foreach(Option.scala:257)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:112)
> at kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$
> extension$1.apply(NetworkClientBlockingOps.scala:108)
> at kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(
> NetworkClientBlockingOps.scala:137)
> at kafka.utils.NetworkClientBlockingOps$.kafka$utils$
> NetworkClientBlockingOps$$pollContinuously$extension(
> NetworkClientBlockingOps.scala:143)
> at kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(
> NetworkClientBlockingOps.scala:108)
> at kafka.controller.RequestSendThread.liftedTree1$
> 1(ControllerChannelManager.scala:190)
> at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.
> scala:181)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)”
>
> In production, we build 6 Kafka brokers with 3 zookeeper nodes on the AWS
> using C3.xlarge type.
> Our JVM settings is as follow: -Xmx1G -Xms1G –server
> -XX:+UseCompressedOops -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:+CMSClassUnloadingEnabled -XX:+CMSScavengeBeforeRemark.
> Our traffic is 500 TPS and each message has average 100KB size.
>
> I am appreciate for your time to give us any help and suggestion about
> this issue!
>
> Best,
>
> Hui
>

Reply via email to