[ 
https://issues.apache.org/jira/browse/KAFKA-6375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rong Tang updated KAFKA-6375:
-----------------------------
    Affects Version/s:     (was: 0.10.2.0)
                       0.10.1.0

> Follower replicas can never catch up to be ISR due to creating 
> ReplicaFetcherThread failed.
> -------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6375
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6375
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.10.1.0
>         Environment: Windows,  23 brokers KafkaCluster
>            Reporter: Rong Tang
>
> Hi, I met with a case that in one broker, the out of sync replicas never 
> catch up.
> When the broker starts up, it receives LeaderAndISR requests from controller, 
> which will call createFetcherThread, the thread creation failed, with 
> exceptions below.
> And then, there is no fetcher for these follower replicas, and it is out of 
> sync forever. Unless, later, it receives LeaderAndISR requests that has 
> higher leader EPOCH.  The broker had 260 out of 330 replicas out of sync for 
> one day, until I restarted it.
> Restart the broker can mitigate the issue.
> I have 2 questions.  
> First, Why NEW ReplicaFetcherThread failed?
> *Second, shouldn't Kafka do something to fail over, instead of letting the 
> broker in abnormal state.*
> It is a 23 brokers Kafka cluster running on Windows. each broker has 330 
> replicas.
> [2017-12-13 16:29:21,317] ERROR Error on broker 1000 while processing 
> LeaderAndIsr request with correlationId 1 received from controller 427703487 
> epoch 22 (state.change.logger)
> org.apache.kafka.common.KafkaException: java.io.IOException: Unable to 
> establish loopback connection
>       at org.apache.kafka.common.network.Selector.<init>(Selector.java:124)
>       at 
> kafka.server.ReplicaFetcherThread.<init>(ReplicaFetcherThread.scala:87)
>       at 
> kafka.server.ReplicaFetcherManager.createFetcherThread(ReplicaFetcherManager.scala:35)
>       at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:83)
>       at 
> kafka.server.AbstractFetcherManager$$anonfun$addFetcherForPartitions$2.apply(AbstractFetcherManager.scala:78)
>       at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
>       at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:221)
>       at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>       at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:428)
>       at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
>       at 
> kafka.server.AbstractFetcherManager.addFetcherForPartitions(AbstractFetcherManager.scala:78)
>       at kafka.server.ReplicaManager.makeFollowers(ReplicaManager.scala:869)
>       at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:689)
>       at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:149)
>       at kafka.server.KafkaApis.handle(KafkaApis.scala:83)
>       at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Unable to establish loopback connection
>       at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:94)
>       at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at sun.nio.ch.PipeImpl.<init>(PipeImpl.java:171)
>       at 
> sun.nio.ch.SelectorProviderImpl.openPipe(SelectorProviderImpl.java:50)
>       at java.nio.channels.Pipe.open(Pipe.java:155)
>       at sun.nio.ch.WindowsSelectorImpl.<init>(WindowsSelectorImpl.java:127)
>       at 
> sun.nio.ch.WindowsSelectorProvider.openSelector(WindowsSelectorProvider.java:44)
>       at java.nio.channels.Selector.open(Selector.java:227)
>       at org.apache.kafka.common.network.Selector.<init>(Selector.java:122)
>       ... 16 more
> Caused by: java.net.ConnectException: Connection timed out: connect
>       at sun.nio.ch.Net.connect0(Native Method)
>       at sun.nio.ch.Net.connect(Net.java:454)
>       at sun.nio.ch.Net.connect(Net.java:446)
>       at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:648)
>       at java.nio.channels.SocketChannel.open(SocketChannel.java:189)
>       at 
> sun.nio.ch.PipeImpl$Initializer$LoopbackConnector.run(PipeImpl.java:127)
>       at sun.nio.ch.PipeImpl$Initializer.run(PipeImpl.java:76)
>       ... 25 more



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to