[
https://issues.apache.org/jira/browse/HDDS-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833857#comment-17833857
]
Ivan Andika commented on HDDS-10612:
------------------------------------
I saw this log that might be the root cause
|${output} = Container 3 is in closing state|
Perhaps there was already a pending CLOSE_CONTAINER event on the SCM event
queue from the previous "container close" command that has not been processed
by the CloseContainerCommandHandler, so "container list" still shows the
container as OPEN. This will call "container close", but if the first
CLOSE_CONTAINER event was already processed (between the "container list" and
the "container close"), it might throw the exception when SCM is checking the
container state in SCMClientProtocolServer#closeContainer.
A solution might be to call "ozone admin container list" only once, and send
the close container command for each, instead of calling the list multiple
times.
> Add Robot test to verify Container Balancer for RATIS containers
> ----------------------------------------------------------------
>
> Key: HDDS-10612
> URL: https://issues.apache.org/jira/browse/HDDS-10612
> Project: Apache Ozone
> Issue Type: Test
> Components: test
> Reporter: Anastasia Filippova
> Assignee: Anastasia Filippova
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.5.0
>
>
> Currently there are only unit tests for Container Balancer and no acceptance
> tests at all. At a minimum, we should add a Robot test to verify Container
> Balancer for RATIS containers. And probably in the future we should add robot
> test for EC case.
> Test case:
> 1. Move 1 datanode to maintenance mode (we use 4 datanodes in this test)
> 2. Create multiple keys (after loading the data, we check that 3 datanodes
> are ~60% busy, and the one that is in maintenance mode is empty)
> 4. Start datanode recommission (wait until datanode recommissioning is
> completed)
> 5. Start container balancer (wait until container balancer is completed)
> 6. Check results (after balancing on all 4 datanodes, we should see
> approximately the same data distribution.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]