[jira] [Commented] (HDDS-343) Containers are stuck in closing state in scm

Nanda kumar (JIRA) Thu, 09 Aug 2018 03:58:09 -0700


    [ 
https://issues.apache.org/jira/browse/HDDS-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574688#comment-16574688
 ]


Nanda kumar commented on HDDS-343:
----------------------------------

[~elek], without HDDS-245 we are not even processing container report. It is 
true that even after the container is closed in datanode the state in SCM is 
not updated from CLOSING to CLOSED.  But this is not the reason for SCM 
retrying the close container command, the reason is that datanode has sent 
Close containerActions twice and SCM is processing them. When SCM realizes that 
it has already executed the close container command once for that particular 
container the second close container action is ignored by SCM which prints the 
message: {{container with id : 2 is in CLOSING state and need not be closed.}}.

Consider the scenario:

1. Container-1 is almost full
2. A client is writing chunk to container-1
3. Datanode realizes that container-1 is almost full and sends close 
containerAction to SCM in next heartbeat
4. SCM processes containerActions, moves the container to CLOSING state and 
adds CloseContainerCommand to queue which will be sent to datanode in next 
heartbeat response
5. Meanwhile, datanode receives another write chunk request for the same 
container and it again realizes that container-1 is almost full and sends 
another close containerAction to SCM in next heartbeat
6. SCM receives close containerAction and notices that the container is already 
in CLOSING state and ignores the close containerAction. In response to  this 
heartbeat SCM sends close container command to datanode
7. Datanode closes the container
8. After this datanode will not send close container action for this container 
as the container is already closed
9. Datanode sends container report to SCM in which this container is marked as 
closed
10. While processing container report, SCM should update the container state 
from CLOSING to CLOSED (has to be done as part of HDDS-245)

> Containers are stuck in closing state in scm
> --------------------------------------------
>
>                 Key: HDDS-343
>                 URL: https://issues.apache.org/jira/browse/HDDS-343
>             Project: Hadoop Distributed Data Store
>          Issue Type: Bug
>          Components: SCM
>            Reporter: Elek, Marton
>            Priority: Blocker
>             Fix For: 0.2.1
>
>
> Containers could not been closed currently.
> The datanode is closing the containers and sending the CLOSED state in the 
> container report but SCM doesn't register that the state is closed and 
> sending the close command again and again.
> I think the ContainerMapping.processContainerReport should be improved.
> {code}
> scm_1           | --> RPC message request: SCMHeartbeatRequestProto from 
> 172.25.0.2:33912
> scm_1           | datanodeDetails {
> scm_1           |   uuid: "9c8f80bd-9424-4d74-99ef-a2bd58e66d7f"
> scm_1           |   ipAddress: "172.25.0.2"
> scm_1           |   hostName: "365fd1f44f0b"
> scm_1           |   ports {
> scm_1           |     name: "STANDALONE"
> scm_1           |     value: 9859
> scm_1           |   }
> scm_1           |   ports {
> scm_1           |     name: "RATIS"
> scm_1           |     value: 9858
> scm_1           |   }
> scm_1           |   ports {
> scm_1           |     name: "REST"
> scm_1           |     value: 9880
> scm_1           |   }
> scm_1           | }
> scm_1           | nodeReport {
> scm_1           |   storageReport {
> scm_1           |     storageUuid: "DS-61e76107-85c5-437a-95a7-aeb8b3e7827f"
> scm_1           |     storageLocation: "/tmp/hadoop-hadoop/dfs/data"
> scm_1           |     capacity: 491630870528
> scm_1           |     scmUsed: 2708828160
> scm_1           |     remaining: 24263614464
> scm_1           |     storageType: DISK
> scm_1           |     failed: false
> scm_1           |   }
> scm_1           | }
> scm_1           | containerReport {
> scm_1           |   reports {
> scm_1           |     containerID: 1
> scm_1           |     used: 1061158912
> scm_1           |     readCount: 0
> scm_1           |     writeCount: 64
> scm_1           |     readBytes: 0
> scm_1           |     writeBytes: 1061158912
> scm_1           |     state: CLOSED
> scm_1           |   }
> scm_1           |   reports {
> scm_1           |     containerID: 2
> scm_1           |     used: 1048576000
> scm_1           |     readCount: 0
> scm_1           |     writeCount: 64
> scm_1           |     readBytes: 0
> scm_1           |     writeBytes: 1048576000
> scm_1           |     state: CLOSED
> scm_1           |   }
> scm_1           |   reports {
> scm_1           |     containerID: 3
> scm_1           |     used: 511705088
> scm_1           |     readCount: 0
> scm_1           |     writeCount: 32
> scm_1           |     readBytes: 0
> scm_1           |     writeBytes: 511705088
> scm_1           |     state: OPEN
> scm_1           |   }
> scm_1           | }
> scm_1           | commandStatusReport {
> scm_1           | }
> scm_1           | containerActions {
> scm_1           |   containerActions {
> scm_1           |     containerID: 1
> scm_1           |     action: CLOSE
> scm_1           |     reason: CONTAINER_FULL
> scm_1           |   }
> scm_1           |   containerActions {
> scm_1           |     containerID: 2
> scm_1           |     action: CLOSE
> scm_1           |     reason: CONTAINER_FULL
> scm_1           |   }
> scm_1           | }
> scm_1           | 
> scm_1           | --> RPC message response: SCMHeartbeatRequestProto to 
> 172.25.0.2:33912
> scm_1           | datanodeUUID: "9c8f80bd-9424-4d74-99ef-a2bd58e66d7f"
> scm_1           | 
> scm_1           | 2018-08-08 16:22:51 INFO  CloseContainerEventHandler:56 - 
> Close container Event triggered for container : 1
> scm_1           | 2018-08-08 16:22:51 INFO  CloseContainerEventHandler:105 - 
> container with id : 1 is in CLOSING state and need not be closed.
> scm_1           | 2018-08-08 16:22:51 INFO  CloseContainerEventHandler:56 - 
> Close container Event triggered for container : 2
> scm_1           | 2018-08-08 16:22:51 INFO  CloseContainerEventHandler:105 - 
> container with id : 2 is in CLOSING state and need not be closed.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDDS-343) Containers are stuck in closing state in scm

Reply via email to