[
https://issues.apache.org/jira/browse/HDDS-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599155#comment-17599155
]
Ethan Rose commented on HDDS-7196:
----------------------------------
Unfortunately there is not a good way to clean up the data without open key
cleanup. Ozone does not have an fsck command. There are other tools that can be
used, but they only operate on the committed namespace, not the open key space
which is hidden from clients. If you want to see which blocks in which
containers map to a given key, use the {{ozone sh key info <keyname>}} command.
If you want to see which containers have blocks for certain keys, Recon has the
/api/v1/containers/:id/keys REST API to supply this information documented
[here|https://ozone.apache.org/docs/current/interface/reconapi.html]. SCM can
tell you which datanodes have a container using the {{ozone admin container
info}} command. Data would have to be removed from disk manually. However, I
would caution against deleting individual blocks from containers, since a
container can be marked unhealthy by the system if it's metadata and on disk
data diverge. If the whole container is marked unhealthy then even good blocks
in the container will be considered lost. Deleting the containers entirely from
disk will cause SCM and Recon to mark the container as missing, but this is
cosmetic if the data in that container is not present in the namespace.
Is this a test cluster? It may be easier to do a fresh install in that case.
> Disk space used by failed job(teragen here) is not reclaimable
> --------------------------------------------------------------
>
> Key: HDDS-7196
> URL: https://issues.apache.org/jira/browse/HDDS-7196
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Environment: |Apache Ozone|1.0.0|
> Reporter: Franklinsam Paul
> Priority: Major
> Attachments: Ozone usage_ after_failing_cleanup.png, Ozone usage_
> fresh_install.png
>
>
> On Fresh ozone cluster, ran a tergane job and killed it around 25%
> completion. this left ozone used about 74.4GB but none of the files written
> is listing.
> Issue can be reproducible with below steps. ( snapshots from the recon UI
> will be attached for usage reference)
> {code:java}
> ozone sh volume create o3://ozonefrankserviceid/testvol/
> ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucketyarn jar
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> teragen -Dmapreduce.job.maps=2 1000000000
> ofs://ozonefrankserviceid/testvol/testbucket
> ozone sh volume create o3://ozonefrankserviceid/testvol/
> ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucket
>
> yarn jar
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> teragen -Dmapreduce.job.maps=2 1000000000
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
>
> ozone fs -ls ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
> ozone fs -du -s -h
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000000_2
> --> no files/bject
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000001_2
> --> no files/object
>
> Ozone usage is increased in the recon UI as 75GB
>
> hdfs dfs -rm -r -skipTrash
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
> ozone sh bucket delete o3://ozonefrankserviceid/testvol/testbucket
>
> [root@DNHOST1 ozone-conf]# grep -A1 'hdds.datanode.dir' ozone-site.xml
> <name>hdds.datanode.dir</name>
> <value>/var/lib/hadoop-ozone/datanode/data</value>
> [root@DNHOST1 ozone-conf]#[root@DNHOST1 containerDir0]# du -sh
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> 26G
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> [root@DNHOST1 containerDir0]#
> [root@DNHOST1 chunks]# ozone sh volume list o3://ozonefrankserviceid/ -a
> |egrep 'name|usedNamespace'
> "name" : "s3v",
> "usedNamespace" : 0,
> "name" : "om",
> "name" : "testvol",
> "usedNamespace" : 0,
> "name" : "hive/[email protected]",
> "name" : "hive",
> [root@DNHOST1 chunks]# {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]