[ 
https://issues.apache.org/jira/browse/HDDS-5794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sammi Chen updated HDDS-5794:
-----------------------------
    Fix Version/s: 1.2.0

> The misleading "No available thread in pool for past * sencond" log message 
> in DN StateContext
> ----------------------------------------------------------------------------------------------
>
>                 Key: HDDS-5794
>                 URL: https://issues.apache.org/jira/browse/HDDS-5794
>             Project: Apache Ozone
>          Issue Type: Bug
>            Reporter: Sammi Chen
>            Assignee: Sammi Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.2.0
>
>
> There are a lot of following messages in DN log file,  which makes me think 
> there is something wrong with the heartbeat channel between the DN and SCM.
> 2021-09-28 10:59:36,871 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,345 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,379 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,584 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,598 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,631 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,650 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,652 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,668 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,741 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,918 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,925 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,930 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,954 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,955 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:37,995 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,005 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,097 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,114 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> 2021-09-28 10:59:38,174 [Datanode State Machine Thread - 0] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.
> After add some extra debug info, I found the thing is not as the LOG shows,  
> the heartbeat channel is very healthy and this "No available thread in pool 
> for past 30 seconds"  is quite misleading.   The corrent part of the LOG is 
> "No available thread in pool",  the wrong part is "for the past 30 seconds". 
> 2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> heartbeat interval : 30000, next HB 6719157366
> 2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting 
> to execute the task 24
> 2021-09-29 11:28:01,257 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Sending heartbeat message
> 2021-09-29 11:28:01,261 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received heartbeat response
> 2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished 
> executing the task 24
> 2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> Sleep 29996 ms
> 2021-09-29 11:28:05,622 [ChunkWriter-125-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-71-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> heartbeat interval : 30000, next HB 6719161731
> 2021-09-29 11:28:05,622 [ChunkWriter-165-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-53-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting 
> to execute the task 25
> 2021-09-29 11:28:05,622 [ChunkWriter-31-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-41-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [ChunkWriter-25-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> execute task.wait is interrupted.
> 2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> heartbeat interval : 30000, next HB 6719161732
> 2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting 
> to execute the task 26
> 2021-09-29 11:28:05,623 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Sending heartbeat message
> 2021-09-29 11:28:05,641 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received heartbeat response
> 2021-09-29 11:28:05,642 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Sending heartbeat message
> 2021-09-29 11:28:05,652 [EndpointStateMachine task thread for 
> /9.186.21.245:9861 - 0 ] INFO 
> org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask:
>  Received heartbeat response
> 2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished 
> executing the task 26
> 2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> Sleep 29971 ms
> 2021-09-29 11:28:05,709 [ChunkWriter-73-0] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> triggerHeartbeat: interrupt stateMachineThread for ICR report
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> heartbeat interval : 30000, next HB 6719161818
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting 
> to execute the task 27
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] ERROR 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> execute task.wait is interrupted.
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
> org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
> heartbeat interval : 30000, next HB 6719161818
> 2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] WARN 
> org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
> available thread in pool for past 30 seconds.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to