Guys,
Another observation is 90% of under-replicated partitions have the same
node as the follower.

*Any help in here is very much appreciated. We have very less time to
stabilize kafka. Thanks a lot in advance.*

-Suman

On Thu, Dec 6, 2018 at 9:08 PM Suman B N <sumannew...@gmail.com> wrote:

> +users
>
> On Thu, Dec 6, 2018 at 9:01 PM Suman B N <sumannew...@gmail.com> wrote:
>
>> Team,
>>
>> We are observing ISR shrink and expand very frequently. In the logs of
>> the follower, below errors are observed:
>>
>> [2018-12-06 20:00:42,709] WARN [ReplicaFetcherThread-2-15], Error in
>> fetch kafka.server.ReplicaFetcherThread$FetchRequest@a0f9ba9
>> (kafka.server.ReplicaFetcherThread)
>> java.io.IOException: Connection to 15 was disconnected before the
>> response was read
>>         at
>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3(NetworkClientBlockingOps.scala:114)
>>         at
>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$3$adapted(NetworkClientBlockingOps.scala:112)
>>         at scala.Option.foreach(Option.scala:257)
>>         at
>> kafka.utils.NetworkClientBlockingOps$.$anonfun$blockingSendAndReceive$1(NetworkClientBlockingOps.scala:112)
>>         at
>> kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:136)
>>         at
>> kafka.utils.NetworkClientBlockingOps$.pollContinuously$extension(NetworkClientBlockingOps.scala:142)
>>         at
>> kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
>>         at
>> kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:249)
>>         at
>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:234)
>>         at
>> kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
>>         at
>> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
>>         at
>> kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
>>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
>>
>> Can someone explain this? And help us understand how we can resolve these
>> under-replicated partitions.
>>
>> server.properties file:
>> broker.id=15
>> port=9092
>> zookeeper.connect=zk1,zk2,zk3,zk4,zk5,zk6
>>
>> default.replication.factor=2
>> log.dirs=/data/kafka
>> delete.topic.enable=true
>> zookeeper.session.timeout.ms=10000
>> inter.broker.protocol.version=0.10.2
>> num.partitions=3
>> min.insync.replicas=1
>> log.retention.ms=259200000
>> message.max.bytes=20971520
>> replica.fetch.max.bytes=20971520
>> replica.fetch.response.max.bytes=20971520
>> max.partition.fetch.bytes=20971520
>> fetch.max.bytes=20971520
>> log.flush.interval.ms=5000
>> log.roll.hours=24
>> num.replica.fetchers=3
>> num.io.threads=8
>> num.network.threads=6
>> log.message.format.version=0.9.0.1
>>
>> Also In what cases we lead to this state? We have 1200-1400 topics and
>> 5000-6000 partitions spread across 20 node cluster. But only 30-40
>> partitions are under-replicated while rest are in-sync. 95% of these
>> partitions are having 2 replication factor.
>>
>> --
>> *Suman*
>>
>
>
> --
> *Suman*
> *OlaCabs*
>


-- 
*Suman*
*OlaCabs*

Reply via email to