[
https://issues.apache.org/jira/browse/HDDS-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574688#comment-16574688
]
Nanda kumar commented on HDDS-343:
----------------------------------
[~elek], without HDDS-245 we are not even processing container report. It is
true that even after the container is closed in datanode the state in SCM is
not updated from CLOSING to CLOSED. But this is not the reason for SCM
retrying the close container command, the reason is that datanode has sent
Close containerActions twice and SCM is processing them. When SCM realizes that
it has already executed the close container command once for that particular
container the second close container action is ignored by SCM which prints the
message: {{container with id : 2 is in CLOSING state and need not be closed.}}.
Consider the scenario:
1. Container-1 is almost full
2. A client is writing chunk to container-1
3. Datanode realizes that container-1 is almost full and sends close
containerAction to SCM in next heartbeat
4. SCM processes containerActions, moves the container to CLOSING state and
adds CloseContainerCommand to queue which will be sent to datanode in next
heartbeat response
5. Meanwhile, datanode receives another write chunk request for the same
container and it again realizes that container-1 is almost full and sends
another close containerAction to SCM in next heartbeat
6. SCM receives close containerAction and notices that the container is already
in CLOSING state and ignores the close containerAction. In response to this
heartbeat SCM sends close container command to datanode
7. Datanode closes the container
8. After this datanode will not send close container action for this container
as the container is already closed
9. Datanode sends container report to SCM in which this container is marked as
closed
10. While processing container report, SCM should update the container state
from CLOSING to CLOSED (has to be done as part of HDDS-245)
> Containers are stuck in closing state in scm
> --------------------------------------------
>
> Key: HDDS-343
> URL: https://issues.apache.org/jira/browse/HDDS-343
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: SCM
> Reporter: Elek, Marton
> Priority: Blocker
> Fix For: 0.2.1
>
>
> Containers could not been closed currently.
> The datanode is closing the containers and sending the CLOSED state in the
> container report but SCM doesn't register that the state is closed and
> sending the close command again and again.
> I think the ContainerMapping.processContainerReport should be improved.
> {code}
> scm_1 | --> RPC message request: SCMHeartbeatRequestProto from
> 172.25.0.2:33912
> scm_1 | datanodeDetails {
> scm_1 | uuid: "9c8f80bd-9424-4d74-99ef-a2bd58e66d7f"
> scm_1 | ipAddress: "172.25.0.2"
> scm_1 | hostName: "365fd1f44f0b"
> scm_1 | ports {
> scm_1 | name: "STANDALONE"
> scm_1 | value: 9859
> scm_1 | }
> scm_1 | ports {
> scm_1 | name: "RATIS"
> scm_1 | value: 9858
> scm_1 | }
> scm_1 | ports {
> scm_1 | name: "REST"
> scm_1 | value: 9880
> scm_1 | }
> scm_1 | }
> scm_1 | nodeReport {
> scm_1 | storageReport {
> scm_1 | storageUuid: "DS-61e76107-85c5-437a-95a7-aeb8b3e7827f"
> scm_1 | storageLocation: "/tmp/hadoop-hadoop/dfs/data"
> scm_1 | capacity: 491630870528
> scm_1 | scmUsed: 2708828160
> scm_1 | remaining: 24263614464
> scm_1 | storageType: DISK
> scm_1 | failed: false
> scm_1 | }
> scm_1 | }
> scm_1 | containerReport {
> scm_1 | reports {
> scm_1 | containerID: 1
> scm_1 | used: 1061158912
> scm_1 | readCount: 0
> scm_1 | writeCount: 64
> scm_1 | readBytes: 0
> scm_1 | writeBytes: 1061158912
> scm_1 | state: CLOSED
> scm_1 | }
> scm_1 | reports {
> scm_1 | containerID: 2
> scm_1 | used: 1048576000
> scm_1 | readCount: 0
> scm_1 | writeCount: 64
> scm_1 | readBytes: 0
> scm_1 | writeBytes: 1048576000
> scm_1 | state: CLOSED
> scm_1 | }
> scm_1 | reports {
> scm_1 | containerID: 3
> scm_1 | used: 511705088
> scm_1 | readCount: 0
> scm_1 | writeCount: 32
> scm_1 | readBytes: 0
> scm_1 | writeBytes: 511705088
> scm_1 | state: OPEN
> scm_1 | }
> scm_1 | }
> scm_1 | commandStatusReport {
> scm_1 | }
> scm_1 | containerActions {
> scm_1 | containerActions {
> scm_1 | containerID: 1
> scm_1 | action: CLOSE
> scm_1 | reason: CONTAINER_FULL
> scm_1 | }
> scm_1 | containerActions {
> scm_1 | containerID: 2
> scm_1 | action: CLOSE
> scm_1 | reason: CONTAINER_FULL
> scm_1 | }
> scm_1 | }
> scm_1 |
> scm_1 | --> RPC message response: SCMHeartbeatRequestProto to
> 172.25.0.2:33912
> scm_1 | datanodeUUID: "9c8f80bd-9424-4d74-99ef-a2bd58e66d7f"
> scm_1 |
> scm_1 | 2018-08-08 16:22:51 INFO CloseContainerEventHandler:56 -
> Close container Event triggered for container : 1
> scm_1 | 2018-08-08 16:22:51 INFO CloseContainerEventHandler:105 -
> container with id : 1 is in CLOSING state and need not be closed.
> scm_1 | 2018-08-08 16:22:51 INFO CloseContainerEventHandler:56 -
> Close container Event triggered for container : 2
> scm_1 | 2018-08-08 16:22:51 INFO CloseContainerEventHandler:105 -
> container with id : 2 is in CLOSING state and need not be closed.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]