ivandika3 commented on code in PR #5725:
URL: https://github.com/apache/ozone/pull/5725#discussion_r1432318897
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/transport/server/ratis/XceiverServerRatis.java:
##########
@@ -703,11 +703,11 @@ private void handlePipelineFailure(RaftGroupId groupId,
}
triggerPipelineClose(groupId, msg,
- ClosePipelineInfo.Reason.PIPELINE_FAILED, false);
+ ClosePipelineInfo.Reason.PIPELINE_FAILED);
Review Comment:
@sumitagrawl I have updated `XceiverServerRatis` to keep track of active
pipelines and its relevant information (i.e. whether the pipeline is pending
close and whether the current datanode is the leader of the pipeline).
I tested manually by shutting down one of the datanodes in an active
pipeline. The leader datanode triggered the pipeline close immediately due to
`notifyFollowerSlowness` hook, but the subsequent pipeline close commands is
triggered in the next heartbeats.
SCM pipeline action close log (separated by the 30s heartbeat interval)
received from the pipeine leader DN
```
2023-12-20 14:14:53,962
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:15:23,963
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:15:53,960
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:16:23,960
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:16:52,409
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:16:52,413
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:16:52,595
[scm1-EventQueue-PipelineActionsForPipelineActionHandler] INFO
org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline
action CLOSE for PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a from datanode
a3fcdd27-8244-4b7a-840c-037dac8c6337. Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
```
The DN was restarted at 14:16:53, maybe why SCM received multiple heartbeat
from the same DN around that time.
DN pipeline close due to follower log (triggered multiple times within a
single heartbeat interval)
```
2023-12-20 14:14:53,956
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 302697ms
2023-12-20 14:15:02,517
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 311262ms
2023-12-20 14:15:09,112
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 317858ms
2023-12-20 14:15:17,303
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 326049ms
2023-12-20 14:15:21,319
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 330065ms
2023-12-20 14:15:26,097
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 334843ms
2023-12-20 14:15:33,563
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 342308ms
2023-12-20 14:15:38,610
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 347356ms
2023-12-20 14:15:47,337
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 356082ms
2023-12-20 14:15:54,298
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 363044ms
2023-12-20 14:15:58,898
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 367643ms
2023-12-20 14:16:04,575
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 373321ms
2023-12-20 14:16:11,602
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 380348ms
2023-12-20 14:16:19,886
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 388632ms
2023-12-20 14:16:24,493
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 393239ms
2023-12-20 14:16:31,369
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 400114ms
2023-12-20 14:16:37,345
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 406091ms
2023-12-20 14:16:45,904
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 414649ms
2023-12-20 14:16:51,766
[a3fcdd27-8244-4b7a-840c-037dac8c6337@group-661AE7EEF57A->a2289e91-7fe6-49e8-ae84-fb33f39719d0-GrpcLogAppender-LogAppenderDaemon]
ERROR
org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
pipeline Action CLOSE on pipeline
PipelineID=38c214e0-9a5e-4b89-b114-661ae7eef57a.Reason :
a3fcdd27-8244-4b7a-840c-037dac8c6337 has not seen follower/s
a2289e91-7fe6-49e8-ae84-fb33f39719d0 for 420512ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]