[ 
https://issues.apache.org/jira/browse/KAFKA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233300#comment-17233300
 ] 

lqjacklee commented on KAFKA-10726:
-----------------------------------

If you are seeing excessive pauses during garbage collection, you can consider 
upgrading your JDK version or garbage collector (or extend your timeout value 
for zookeeper.session.timeout.ms). Additionally, you can tune your Java runtime 
to minimize garbage collection. The engineers at LinkedIn have written about 
optimizing JVM garbage collection in depth. Of course, you can also check the 
Kafka documentation for some recommendations.


some metrics which provide more information can help you :



||Name|| Description || Metric type|| Availability||
|outstanding_requests |Number of requests queued| Resource: Saturation | 
Four-letter words, AdminServer, JMX|
|avg_latency|Amount of time it takes to respond to a client request (in 
ms)|Work: Throughput|Four-letter words, AdminServer, JMX|
|num_alive_connections|Number of clients connected to ZooKeeper|Resource: 
Availability|Four-letter words, AdminServer, JMX|
|followers|Number of active followers|Resource: Availability|Four-letter words, 
AdminServer
|pending_syncs|Number of pending syncs from followers|Other|Four-letter words, 
AdminServer, JMX|
|open_file_descriptor_count|Number of file descriptors in use|Resource: 
Utilization|Four-letter words, AdminServer|




> How to detect heartbeat failure between broker/zookeeper leader
> ---------------------------------------------------------------
>
>                 Key: KAFKA-10726
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10726
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller, logging
>    Affects Versions: 2.1.1
>            Reporter: Keiichiro Wakasa
>            Priority: Critical
>
> Hello experts,
> I'm not sure this is proper place to ask but I'd appreciate if you could help 
> us with the following question...
>  
> We've continuously suffered from broker exclusion caused by heartbeat timeout 
> between broker and zookeeper leader.
> This issue can be easily detected by checking ephemeral nodes via zkcli.sh 
> but we'd like to detect this with logs like server.log/controller.log since 
> we have an existing system to forward these logs to our system. 
> Looking at server.log/controller.log, we couldn't find any logs that 
> indicates the heartbeat timeout. Is there any other logs to check for 
> heartbeat health?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to