[ 
https://issues.apache.org/jira/browse/HDFS-12328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162393#comment-16162393
 ] 

Weiwei Yang commented on HDFS-12328:
------------------------------------

Hi [~yuanbo]

Thanks for updating the description, I noticed that you proposed to introduce a 
new sub command for scm "-txid", well I am not in favor of this. The reason is 
the TXs are internal notions, we don't need to expose this to end user. When a 
block cannot be deleted after max time of retries, we consider this block is 
*corrupted*, from user level, I think we need a *block* level command in SCM. 
Some initial thoughts

{code}
// list all corrupted block IDs
hdfs scm -block -list --corrupted

// get detail info of this block as much as possible, where the data locates
// so admin can logon to certain datanode to debug why deletion was failed
hdfs scm -block -info xxx

// delete a certain block
hdfs scm -block -delete xxx

// delete all corrupted blocks
// this will need extra confirmation from keyboard by user
hdfs scm -block -delete --corrupted 
{code}

I have set the priority to major, because I don't think this is a super 
important feature that must be addressed now (lets get this done as a post 
merge task). At present, we have alternative to leverage SQLCli to dump DB info 
to debug. Also like [~linyiqun] commented, it might be good to start with 
adding corrupted blocks in SCM JMX which is a smaller task and that can help us 
understand how big the problem is here.

Thanks

> Ozone: Purge metadata of deleted blocks after max retry times
> -------------------------------------------------------------
>
>                 Key: HDFS-12328
>                 URL: https://issues.apache.org/jira/browse/HDFS-12328
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Yuanbo Liu
>            Assignee: Yuanbo Liu
>              Labels: OzonePostMerge
>
> In HDFS-12283, we set the value of count to -1 if blocks cannot be deleted 
> after max retry times. We need to provide APIs for admins to purge the "-1" 
> metadata manually. Implement these commands:
> list the txids
> {code}
> hdfs scm -txid list -count<number> -retry <number>
> {code}
> delete the txid
> {code}
> hdfs scm -txid delete -id <txid>
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to