[ 
https://issues.apache.org/jira/browse/HDDS-10612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833857#comment-17833857
 ] 

Ivan Andika commented on HDDS-10612:
------------------------------------

I saw this log that might be the root cause
|${output} = Container 3 is in closing state|

Perhaps there was already a pending CLOSE_CONTAINER event on the SCM event 
queue from the previous "container close" command that has not been processed 
by the CloseContainerCommandHandler, so "container list" still shows the 
container as OPEN. This will call "container close", but if the first 
CLOSE_CONTAINER event was already processed (between the "container list" and 
the "container close"), it might throw the exception when SCM is checking the 
container state in SCMClientProtocolServer#closeContainer. 

A solution might be to call "ozone admin container list" only once, and send 
the close container command for each, instead of calling the list multiple 
times.

> Add Robot test to verify Container Balancer for RATIS containers
> ----------------------------------------------------------------
>
>                 Key: HDDS-10612
>                 URL: https://issues.apache.org/jira/browse/HDDS-10612
>             Project: Apache Ozone
>          Issue Type: Test
>          Components: test
>            Reporter: Anastasia Filippova
>            Assignee: Anastasia Filippova
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.5.0
>
>
> Currently there are only unit tests for Container Balancer and no acceptance 
> tests at all. At a minimum, we should add a Robot test to verify Container 
> Balancer for RATIS containers. And probably in the future we should add robot 
> test for EC case.
> Test case:
> 1. Move 1 datanode to maintenance mode (we use 4 datanodes in this test)
> 2. Create multiple keys  (after loading the data, we check that 3 datanodes 
> are ~60% busy, and the one that is in maintenance mode is empty)
> 4. Start datanode recommission (wait until datanode recommissioning is 
> completed)
> 5. Start container balancer (wait until container balancer is completed)
> 6. Check results (after balancing on all 4 datanodes, we should see 
> approximately the same data distribution.)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to