Thanks Mani. I will monitor them. On Fri, Oct 26, 2018 at 3:34 PM Manikumar <manikumar.re...@gmail.com> wrote:
> We can monitor below replica related metrics. Try tuning " > replica.lag.time.max.ms" , "replica.fetch.max.bytes" . > look for logs starting with "Shrinking ISR for partition ...". > > kafka.server:type=ReplicaManager,name=IsrShrinksPerSec > kafka.server:type=ReplicaManager,name=IsrExpandsPerSec > kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica > > kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.\w]+),topic=([-.\w]+),partition=([0-9]+) > > On Thu, Oct 25, 2018 at 7:18 PM Suman B N <sumannew...@gmail.com> wrote: > > > Still looking for some response here. Pls assist. > > > > On Sat, Oct 20, 2018 at 12:43 AM Suman B N <sumannew...@gmail.com> > wrote: > > > > > Rate of ingestion is not 150-200rps. Its 150k-200k rps. > > > > > > On Fri, Oct 19, 2018 at 11:12 PM Suman B N <sumannew...@gmail.com> > > wrote: > > > > > >> Team, > > >> We have been observing some partitions being under-replicated. Broker > > >> version 0.10.2.1. Below actions were carried out but in vain: > > >> > > >> - Tried restarting nodes. > > >> - Tried increasing replica fetcher threads. Recommend ideal replica > > >> fetcher threads for a 20 node cluster with 150-200rps spread across > > 1000 > > >> topics and 3000 partitions. > > >> - Tried increasing network threads. (I think this doesn't have any > > >> effect but still wanted to try). Recommend ideal network threads > for > > a 20 > > >> node cluster with 150-200rps spread across 1000 topics and 3000 > > partitions. > > >> > > >> Logs look very clean. No exceptions. I don't have much idea on how > > >> replica fetcher threads and logs can be debugged. So asking for help > > here. > > >> Any help or leads would be appreciated. > > >> > > >> -- > > >> *Suman* > > >> *OlaCabs* > > >> > > > > > > > > > -- > > > *Suman* > > > *OlaCabs* > > > > > > > > > -- > > *Suman* > > *OlaCabs* > > > -- *Suman* *OlaCabs*