[ 
https://issues.apache.org/jira/browse/HDDS-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599155#comment-17599155
 ] 

Ethan Rose commented on HDDS-7196:
----------------------------------

Unfortunately there is not a good way to clean up the data without open key 
cleanup. Ozone does not have an fsck command. There are other tools that can be 
used, but they only operate on the committed namespace, not the open key space 
which is hidden from clients. If you want to see which blocks in which 
containers map to a given key, use the {{ozone sh key info <keyname>}} command. 
If you want to see which containers have blocks for certain keys, Recon has the 
/api/v1/containers/:id/keys REST API to supply this information documented 
[here|https://ozone.apache.org/docs/current/interface/reconapi.html]. SCM can 
tell you which datanodes have a container using the {{ozone admin container 
info}} command. Data would have to be removed from disk manually. However, I 
would caution against deleting individual blocks from containers, since a 
container can be marked unhealthy by the system if it's metadata and on disk 
data diverge. If the whole container is marked unhealthy then even good blocks 
in the container will be considered lost. Deleting the containers entirely from 
disk will cause SCM and Recon to mark the container as missing, but this is 
cosmetic if the data in that container is not present in the namespace.

 

Is this a test cluster? It may be easier to do a fresh install in that case.

> Disk space used by failed job(teragen here) is not reclaimable
> --------------------------------------------------------------
>
>                 Key: HDDS-7196
>                 URL: https://issues.apache.org/jira/browse/HDDS-7196
>             Project: Apache Ozone
>          Issue Type: Improvement
>          Components: Ozone Datanode
>         Environment: |Apache Ozone|1.0.0|
>            Reporter: Franklinsam Paul
>            Priority: Major
>         Attachments: Ozone usage_ after_failing_cleanup.png, Ozone usage_ 
> fresh_install.png
>
>
> On Fresh ozone cluster, ran a tergane job and killed it around 25% 
> completion. this left ozone used about 74.4GB but none of the files written 
> is listing. 
> Issue can be reproducible with below steps. ( snapshots from the recon UI 
> will be attached for usage reference)
> {code:java}
> ozone sh volume create  o3://ozonefrankserviceid/testvol/
> ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucketyarn jar 
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar 
> teragen -Dmapreduce.job.maps=2 1000000000 
> ofs://ozonefrankserviceid/testvol/testbucket
>  ozone sh volume create  o3://ozonefrankserviceid/testvol/
>  ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucket
>  
>  yarn jar 
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar 
> teragen -Dmapreduce.job.maps=2 1000000000 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
>  
>  ozone fs -ls ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
>  ozone fs -ls 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary
>  ozone fs -ls 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1
>  ozone fs -ls 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
>  ozone fs -du -s -h 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
>  ozone fs -ls 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000000_2
>  --> no files/bject
>  ozone fs -ls 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000001_2
>  --> no files/object
>  
>  Ozone usage is increased in the recon UI as 75GB
>  
>  hdfs dfs -rm -r -skipTrash 
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
>  ozone sh bucket delete o3://ozonefrankserviceid/testvol/testbucket
>  
>  [root@DNHOST1 ozone-conf]# grep -A1 'hdds.datanode.dir' ozone-site.xml
>     <name>hdds.datanode.dir</name>
>     <value>/var/lib/hadoop-ozone/datanode/data</value>
> [root@DNHOST1 ozone-conf]#[root@DNHOST1 containerDir0]# du -sh 
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> 26G    
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> [root@DNHOST1 containerDir0]#
> [root@DNHOST1 chunks]# ozone sh volume list  o3://ozonefrankserviceid/ -a 
> |egrep 'name|usedNamespace'
>   "name" : "s3v",
>   "usedNamespace" : 0,
>     "name" : "om",
>   "name" : "testvol",
>   "usedNamespace" : 0,
>     "name" : "hive/[email protected]",
>     "name" : "hive",
> [root@DNHOST1 chunks]# {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to