Sammi Chen created HDDS-5794:
--------------------------------

             Summary: The misleading "No available thread in pool for past 
*sencond" message in DN StateContext
                 Key: HDDS-5794
                 URL: https://issues.apache.org/jira/browse/HDDS-5794
             Project: Apache Ozone
          Issue Type: Bug
            Reporter: Sammi Chen
            Assignee: Sammi Chen


There are a lot of following messages in DN log file,  which makes me think 
there is something wrong with the heartbeat channel between the DN and SCM.


2021-09-28 10:59:36,871 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,345 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,379 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,584 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,598 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,631 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,650 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,652 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,668 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,741 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,918 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,925 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,930 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,954 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,955 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:37,995 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:38,005 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:38,097 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:38,114 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.
2021-09-28 10:59:38,174 [Datanode State Machine Thread - 0] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.


After add some extra debug info, I found the thing is not as the LOG shows,  
the heartbeat channel is very healthy and this "No available thread in pool for 
past 30 seconds"  is quite misleading.   The corrent part of the LOG is "No 
available thread in pool",  the wrong part is "for the past 30 seconds". 


2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
heartbeat interval : 30000, next HB 6719157366
2021-09-29 11:28:01,257 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to 
execute the task 24
2021-09-29 11:28:01,257 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Sending heartbeat message
2021-09-29 11:28:01,261 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received heartbeat response
2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished 
executing the task 24
2021-09-29 11:28:01,261 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
Sleep 29996 ms
2021-09-29 11:28:05,622 [ChunkWriter-125-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [ChunkWriter-71-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
heartbeat interval : 30000, next HB 6719161731
2021-09-29 11:28:05,622 [ChunkWriter-165-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [ChunkWriter-53-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to 
execute the task 25
2021-09-29 11:28:05,622 [ChunkWriter-31-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [ChunkWriter-41-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [ChunkWriter-25-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,622 [DatanodeStateMachineDaemonThread] ERROR 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
execute task.wait is interrupted.
2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
heartbeat interval : 30000, next HB 6719161732
2021-09-29 11:28:05,623 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to 
execute the task 26
2021-09-29 11:28:05,623 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Sending heartbeat message
2021-09-29 11:28:05,641 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received heartbeat response
2021-09-29 11:28:05,642 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Sending heartbeat message
2021-09-29 11:28:05,652 [EndpointStateMachine task thread for 
/9.186.21.245:9861 - 0 ] INFO 
org.apache.hadoop.ozone.container.common.states.endpoint.HeartbeatEndpointTask: 
Received heartbeat response
2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Finished 
executing the task 26
2021-09-29 11:28:05,652 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
Sleep 29971 ms
2021-09-29 11:28:05,709 [ChunkWriter-73-0] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
triggerHeartbeat: interrupt stateMachineThread for ICR report
2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
heartbeat interval : 30000, next HB 6719161818
2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: Starting to 
execute the task 27
2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] ERROR 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
execute task.wait is interrupted.
2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] INFO 
org.apache.hadoop.ozone.container.common.statemachine.DatanodeStateMachine: 
heartbeat interval : 30000, next HB 6719161818
2021-09-29 11:28:05,709 [DatanodeStateMachineDaemonThread] WARN 
org.apache.hadoop.ozone.container.common.statemachine.StateContext: No 
available thread in pool for past 30 seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to