[
https://issues.apache.org/jira/browse/HDDS-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614948#comment-16614948
]
Shashikant Banerjee commented on HDDS-461:
------------------------------------------
The issue seems to be happening when a closeContainer command is received at a
Datanode, it gets queued up to the RaftServer and fails with NOT LEADER
Exception which is ignored assuming the leader will close the container. But It
may happen the datanode may die during this time ,and since the earlier
exception is already ignored, SCM will never retry it on other nodes.
> container remains in CLOSING state in SCM forever
> -------------------------------------------------
>
> Key: HDDS-461
> URL: https://issues.apache.org/jira/browse/HDDS-461
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Affects Versions: 0.2.1
> Reporter: Nilotpal Nandi
> Assignee: Shashikant Banerjee
> Priority: Major
> Attachments: all-node-ozone-logs-1536920345.tar.gz
>
>
> Container id # 13's state is not changing from CLOSING to CLOSED.
> {noformat}
> [root@ctr-e138-1518143905142-459606-01-000002 bin]# ./ozone scmcli info 13
> raft.rpc.type = GRPC (default)
> raft.grpc.message.size.max = 33554432 (custom)
> raft.client.rpc.retryInterval = 300 ms (default)
> raft.client.async.outstanding-requests.max = 100 (default)
> raft.client.async.scheduler-threads = 3 (default)
> raft.grpc.flow.control.window = 1MB (=1048576) (default)
> raft.grpc.message.size.max = 33554432 (custom)
> raft.client.rpc.request.timeout = 3000 ms (default)
> Container id: 13
> Container State: OPEN
> Container Path:
> /tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/13/metadata
> Container Metadata:
> LeaderID: ctr-e138-1518143905142-459606-01-000003.hwx.site
> Datanodes:
> [ctr-e138-1518143905142-459606-01-000007.hwx.site,ctr-e138-1518143905142-459606-01-000008.hwx.site,ctr-e138-1518143905142-459606-01-000003.hwx.site]{noformat}
>
> snippet of scmcli list :
> {noformat}
> {
> "state" : "CLOSING",
> "replicationFactor" : "THREE",
> "replicationType" : "RATIS",
> "allocatedBytes" : 4831838208,
> "usedBytes" : 4831838208,
> "numberOfKeys" : 0,
> "lastUsed" : 4391827471,
> "stateEnterTime" : 5435591457,
> "owner" : "f8332db1-b8b1-4077-a9ea-097033d074b7",
> "containerID" : 13,
> "deleteTransactionId" : 0,
> "containerOpen" : true
> }{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]