[ 
https://issues.apache.org/jira/browse/HDFS-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019087#comment-16019087
 ] 

Weiwei Yang commented on HDFS-11844:
------------------------------------

Hi [~xyao]

Thanks for your comments, sorry the description was a bit confused that I just 
updated hope it makes more sense now. Here is the way to reproduce it

1. Create a container *20170522c1* via CLI

{code}
hdfs scm -container -create -c 20170522c1
{code}

2. Get the info of this container via CLI, this CLI was implemented in 
HDFS-11680

{code}
hdfs scm -container -info 20170522c1
{code}

this command returns

{noformat}
Container Name: 20170522c1
Container State: OPEN
Container DB Path: 
/home/wwei/hadoop-data/hdfs/data/containers/20170522c1/metadata/container.db
Container Path: /home/wwei/hadoop-data/scm/repository/20170522c1.container
Container Metadata: {}
LeaderID: ozone1.fyre.ibm.com
Datanodes: [ozone1.fyre.ibm.com]
{noformat}

3. Restart DN

4. Run info command again to the same container *20170522c1*

{code}
hdfs scm -container -info 20170522c1
{code}

it fails with following error

{noformat}
Error executing 
command:org.apache.hadoop.scm.container.common.helpers.StorageContainerException:
 Unable to find the container. Name: 20170522c1
{noformat}

This is because DN maintains a container mapping in memory {{containerMap}} 
which is not reloaded upon DN's restart. This mapping is used everywhere to 
fast check container state before query DB. So I am proposing to restore its 
state during restart.

We do have an open JIRA to implement GET container API, HDFS-11677. That one is 
not done yet. I am testing with HDFS-11680 which is able to work with both OPEN 
and CLOSED containers. But because of this problem, it now works inconsistently.

Please let me know your thought. Thank you.

> Ozone: Recover SCM state when SCM is restarted
> ----------------------------------------------
>
>                 Key: HDFS-11844
>                 URL: https://issues.apache.org/jira/browse/HDFS-11844
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone, scm
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>
> SCM losses its state once being restarted. This issue can be found by a 
> simple test with following steps
> # Start NN, DN, SCM
> # Create several containers via SCM CLI
> # Restart DN
> # Get existing container info via SCM CLI, this step will fail with container 
> doesn't exist error.
> {{ContainerManagerImpl}} maintains a cache of container mapping 
> {{containerMap}}, if DN is restarted, this information is lost. We need a way 
> to restore the state from DB in a background thread.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to