[ 
https://issues.apache.org/jira/browse/HDDS-3022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17730459#comment-17730459
 ] 

Sumit Agrawal commented on HDDS-3022:
-------------------------------------

As per verification, pipeline is getting closed when disk is full. The issue 
does not exist now.

 
{code:java}
2023-06-08 14:14:17,285 ERROR ratis.ContainerStateMachine 
(ContainerStateMachine.java:takeSnapshot(337)) - group-096316BFE74D: Failed to 
write snapshot at:(t:1, i:13) file 
/Volumes/TestVol/dn2/76b32f42-0d08-4533-bc0f-096316bfe74d/sm/snapshot.1_13
2023-06-08 14:14:17,285 ERROR impl.StateMachineUpdater 
(StateMachineUpdater.java:takeSnapshot(286)) - 
954f6bab-8ef1-47df-92a5-bdcd90d3a571@group-096316BFE74D-StateMachineUpdater: 
Failed to take snapshot
java.io.FileNotFoundException: 
/Volumes/TestVol/dn2/76b32f42-0d08-4533-bc0f-096316bfe74d/sm/snapshot.1_13 (No 
space left on device)
    at java.io.FileOutputStream.open0(Native Method)
    at java.io.FileOutputStream.open(FileOutputStream.java:270)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
    at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
    at 
org.apache.hadoop.ozone.container.common.transport.server.ratis.ContainerStateMachine.takeSnapshot(ContainerStateMachine.java:331)
    at 
org.apache.ratis.server.impl.StateMachineUpdater.takeSnapshot(StateMachineUpdater.java:274)
    at 
org.apache.ratis.server.impl.StateMachineUpdater.checkAndTakeSnapshot(StateMachineUpdater.java:266)
    at 
org.apache.ratis.server.impl.StateMachineUpdater.run(StateMachineUpdater.java:188)
    at java.lang.Thread.run(Thread.java:748)
2023-06-08 14:14:17,290 INFO  server.RaftServer$Division 
(ServerState.java:close(472)) - 
954f6bab-8ef1-47df-92a5-bdcd90d3a571@group-096316BFE74D: closes. applyIndex: 13
2023-06-08 14:14:17,721 INFO  segmented.SegmentedRaftLogWorker 
(SegmentedRaftLogWorker.java:close(252)) - 
954f6bab-8ef1-47df-92a5-bdcd90d3a571@group-096316BFE74D-SegmentedRaftLogWorker 
close()
2023-06-08 14:14:17,746 INFO  keyvalue.KeyValueContainer 
(KeyValueContainer.java:flushAndSyncDB(446)) - Container 4 is synced with bcsId 
4.
2023-06-08 14:14:17,753 INFO  keyvalue.KeyValueContainer 
(KeyValueContainer.java:flushAndSyncDB(446)) - Container 4 is synced with bcsId 
4.
2023-06-08 14:14:17,774 INFO  keyvalue.KeyValueContainer 
(KeyValueContainer.java:flushAndSyncDB(446)) - Container 5 is synced with bcsId 
10.
2023-06-08 14:14:17,778 INFO  keyvalue.KeyValueContainer 
(KeyValueContainer.java:flushAndSyncDB(446)) - Container 5 is synced with bcsId 
10.
2023-06-08 14:14:17,786 INFO  server.RaftServer$Division 
(RaftServerImpl.java:groupRemove(436)) - 
954f6bab-8ef1-47df-92a5-bdcd90d3a571@group-096316BFE74D: Succeed to remove 
RaftStorageDirectory Storage Directory 
/Volumes/TestVol/dn2/76b32f42-0d08-4533-bc0f-096316bfe74d
2023-06-08 14:14:17,790 INFO  commandhandler.ClosePipelineCommandHandler 
(ClosePipelineCommandHandler.java:lambda$handle$0(87)) - Close Pipeline 
PipelineID=76b32f42-0d08-4533-bc0f-096316bfe74d command on datanode 
954f6bab-8ef1-47df-92a5-bdcd90d3a571. {code}

> Datanode unable to close Pipeline after disk out of space
> ---------------------------------------------------------
>
>                 Key: HDDS-3022
>                 URL: https://issues.apache.org/jira/browse/HDDS-3022
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>    Affects Versions: 0.5.0
>            Reporter: Vivek Ratnavel Subramanian
>            Assignee: Sumit Agrawal
>            Priority: Critical
>              Labels: TriagePending
>         Attachments: ozone_logs.zip
>
>
> Datanode gets into a loop and keeps throwing errors while trying to close 
> pipeline
> {code:java}
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from  
> FOLLOWER to CANDIDATE at term 6240 for changeToCandidate
> 2020-02-14 00:25:10,208 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=02e7e10e-2d50-4ace-a18b-701265ec9f07.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 is in candidate state for 31898494ms
> 2020-02-14 00:25:10,208 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start LeaderElection
> 2020-02-14 00:25:10,223 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> begin an election at term 6241 for 0: 
> [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,259 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-LeaderElection37032: 
> Election REJECTED; received 0 response(s) [] and 2 exception(s); 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07:t6241, leader=null, 
> voted=285cac09-7622-45e6-be02-b3c68ebf8b10, 
> raftlog=285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07-SegmentedRaftLog:OPENED:c4,f4,i14,
>  conf=0: [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-701265EC9F07 not found.
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-701265EC9F07: changes role from 
> CANDIDATE to FOLLOWER at term 6241 for DISCOVERED_A_NEW_TERM
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown LeaderElection
> 2020-02-14 00:25:10,270 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start FollowerState
> 2020-02-14 00:25:10,680 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-DD847EC75388->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12669,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:10,752 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=7ad5ce51-d3fa-4e71-99f2-dd847ec75388.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1-9078-28497a08ab85 for 31623987ms 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e for 31618878ms
> 2020-02-14 00:25:10,894 INFO org.apache.ratis.server.impl.FollowerState: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-FollowerState: change 
> to CANDIDATE, lastRpcTime:5021ms, electionTimeout:5017ms
> 2020-02-14 00:25:10,894 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown FollowerState
> 2020-02-14 00:25:10,894 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9: changes role from  
> FOLLOWER to CANDIDATE at term 6220 for changeToCandidate
> 2020-02-14 00:25:10,894 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=179ac1d0-e5d5-4898-bef7-0068fd2ea2c9.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 is in candidate state for 31805092ms
> 2020-02-14 00:25:10,894 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start LeaderElection
> 2020-02-14 00:25:10,917 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-LeaderElection37033: 
> begin an election at term 6221 for 0: 
> [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-LeaderElection37033 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-0068FD2EA2C9 not found.
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-LeaderElection37033 
> got exception when requesting votes: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-0068FD2EA2C9 not found.
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.LeaderElection: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-LeaderElection37033: 
> Election REJECTED; received 0 response(s) [] and 2 exception(s); 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9:t6221, leader=null, 
> voted=285cac09-7622-45e6-be02-b3c68ebf8b10, 
> raftlog=285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9-SegmentedRaftLog:OPENED:c0,f0,i8,
>  conf=0: [d432c890-5ec4-4cf1-9078-28497a08ab85:10.65.6.227:9858, 
> 285cac09-7622-45e6-be02-b3c68ebf8b10:10.65.24.80:9858, 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e:10.65.8.165:9858], old=null
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 0: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e: group-0068FD2EA2C9 not found.
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.LeaderElection:   
> Exception 1: java.util.concurrent.ExecutionException: 
> org.apache.ratis.thirdparty.io.grpc.StatusRuntimeException: INTERNAL: 
> d432c890-5ec4-4cf1-9078-28497a08ab85: group-0068FD2EA2C9 not found.
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.RaftServerImpl: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-0068FD2EA2C9: changes role from 
> CANDIDATE to FOLLOWER at term 6221 for DISCOVERED_A_NEW_TERM
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown LeaderElection
> 2020-02-14 00:25:10,921 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: start FollowerState
> 2020-02-14 00:25:11,134 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-DD847EC75388->cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12669,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:11,218 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=7ad5ce51-d3fa-4e71-99f2-dd847ec75388.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1-9078-28497a08ab85 for 31624453ms 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e for 31619344ms
> 2020-02-14 00:25:11,347 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-2338B042C07B->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12579,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:11,361 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-2338B042C07B->cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12577,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:11,399 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=6a851c59-0345-4ad8-ac31-2338b042c07b.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1-9078-28497a08ab85 for 31396085ms 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e for 31391530ms
> 2020-02-14 00:25:11,406 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=6a851c59-0345-4ad8-ac31-2338b042c07b.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1-9078-28497a08ab85 for 31396092ms 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e for 31391537ms
> 2020-02-14 00:25:11,423 WARN org.apache.ratis.grpc.server.GrpcLogAppender: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-BA1E8724EE74->d432c890-5ec4-4cf1-9078-28497a08ab85-GrpcLogAppender:
>  HEARTBEAT appendEntries Timeout, 
> request=AppendEntriesRequest:cid=12817,entriesCount=0,lastEntry=null
> 2020-02-14 00:25:11,490 ERROR 
> org.apache.hadoop.ozone.container.common.transport.server.ratis.XceiverServerRatis:
>  pipeline Action CLOSE  on pipeline 
> PipelineID=1ed1be53-b526-41af-bdf9-ba1e8724ee74.Reason : 
> 285cac09-7622-45e6-be02-b3c68ebf8b10 has not seen follower/s 
> d432c890-5ec4-4cf1-9078-28497a08ab85 for 31946345ms 
> cabbdef8-ed6c-4fc7-b7b2-d1ddd07da47e for 31945978ms
> 2020-02-14 00:25:11,909 INFO org.apache.ratis.server.impl.FollowerState: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10@group-D506E1A1894E-FollowerState: change 
> to CANDIDATE, lastRpcTime:5094ms, electionTimeout:5093ms
> 2020-02-14 00:25:11,909 INFO org.apache.ratis.server.impl.RoleInfo: 
> 285cac09-7622-45e6-be02-b3c68ebf8b10: shutdown FollowerState
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to