[ 
https://issues.apache.org/jira/browse/HDDS-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614948#comment-16614948
 ] 

Shashikant Banerjee commented on HDDS-461:
------------------------------------------

The issue seems to be happening when a closeContainer command is received at a 
Datanode, it gets queued up to the RaftServer and fails with NOT LEADER 
Exception which is ignored assuming the leader will close the container. But It 
may happen the datanode may die during this time ,and since the earlier 
exception is already ignored, SCM will never retry it on other nodes.

> container remains in CLOSING state in SCM forever
> -------------------------------------------------
>
>                 Key: HDDS-461
>                 URL: https://issues.apache.org/jira/browse/HDDS-461
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>    Affects Versions: 0.2.1
>            Reporter: Nilotpal Nandi
>            Assignee: Shashikant Banerjee
>            Priority: Major
>         Attachments: all-node-ozone-logs-1536920345.tar.gz
>
>
> Container id # 13's state is not changing from CLOSING to CLOSED.
> {noformat}
> [root@ctr-e138-1518143905142-459606-01-000002 bin]# ./ozone scmcli info 13
> raft.rpc.type = GRPC (default)
> raft.grpc.message.size.max = 33554432 (custom)
> raft.client.rpc.retryInterval = 300 ms (default)
> raft.client.async.outstanding-requests.max = 100 (default)
> raft.client.async.scheduler-threads = 3 (default)
> raft.grpc.flow.control.window = 1MB (=1048576) (default)
> raft.grpc.message.size.max = 33554432 (custom)
> raft.client.rpc.request.timeout = 3000 ms (default)
> Container id: 13
> Container State: OPEN
> Container Path: 
> /tmp/hadoop-root/dfs/data/hdds/de0a9e01-4a12-40e3-b567-51b9bd83248e/current/containerDir0/13/metadata
> Container Metadata:
> LeaderID: ctr-e138-1518143905142-459606-01-000003.hwx.site
> Datanodes: 
> [ctr-e138-1518143905142-459606-01-000007.hwx.site,ctr-e138-1518143905142-459606-01-000008.hwx.site,ctr-e138-1518143905142-459606-01-000003.hwx.site]{noformat}
>  
> snippet of scmcli list :
> {noformat}
> {
>  "state" : "CLOSING",
>  "replicationFactor" : "THREE",
>  "replicationType" : "RATIS",
>  "allocatedBytes" : 4831838208,
>  "usedBytes" : 4831838208,
>  "numberOfKeys" : 0,
>  "lastUsed" : 4391827471,
>  "stateEnterTime" : 5435591457,
>  "owner" : "f8332db1-b8b1-4077-a9ea-097033d074b7",
>  "containerID" : 13,
>  "deleteTransactionId" : 0,
>  "containerOpen" : true
> }{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to