[
https://issues.apache.org/jira/browse/HDDS-7196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598661#comment-17598661
]
Ethan Rose commented on HDDS-7196:
----------------------------------
Hi [~frnklnsm]. Since the data is not showing up in the committed namespace, it
looks like the clients were stopped while writing data to the datanodes but
before they could commit the data to OM to make it visible in the namespace.
This means the corresponding keys are remaining as open keys in the Ozone
Manager. In Ozone's master branch and upcoming 1.3.0 release, the open key
cleanup service has been implemented. This will scan the open key table to
remove open keys that have been there for over a week (configurable value using
om.open.key.expire.threshold), and move them to the deleted key table. In prior
versions like 1.0.0 listed here, the open keys will remain in the system
indefinitely which appears to be what you observed.
Ozone's normal key deletion flow will take affect after that, which is also
what is used when keys are explicitly deleted. Every minute the OM will scan
the deleted keys table and move up to 20,000 keys' blocks to SCM for deletion.
Every minute SCM will move 20,000 blocks to their corresponding datanodes for
deletion. Every minute datanodes will scan containers for blocks to delete,
eventually removing them from the system. Note that blocks are not deleted from
open containers until they are closed.
> Disk space used by failed job(teragen here) is not reclaimable
> --------------------------------------------------------------
>
> Key: HDDS-7196
> URL: https://issues.apache.org/jira/browse/HDDS-7196
> Project: Apache Ozone
> Issue Type: Improvement
> Components: Ozone Datanode
> Environment: |Apache Ozone|1.0.0|
> Reporter: Franklinsam Paul
> Priority: Major
> Attachments: Ozone usage_ after_failing_cleanup.png, Ozone usage_
> fresh_install.png
>
>
> On Fresh ozone cluster, ran a tergane job and killed it around 25%
> completion. this left ozone used about 74.4GB but none of the files written
> is listing.
> Issue can be reproducible with below steps. ( snapshots from the recon UI
> will be attached for usage reference)
> {code:java}
> ozone sh volume create o3://ozonefrankserviceid/testvol/
> ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucketyarn jar
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> teragen -Dmapreduce.job.maps=2 1000000000
> ofs://ozonefrankserviceid/testvol/testbucket
> ozone sh volume create o3://ozonefrankserviceid/testvol/
> ozone sh bucket create o3://ozonefrankserviceid/testvol/testbucket
>
> yarn jar
> /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
> teragen -Dmapreduce.job.maps=2 1000000000
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
>
> ozone fs -ls ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
> ozone fs -du -s -h
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000000_2
> --> no files/bject
> ozone fs -ls
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1/_temporary/1/_temporary/attempt_1661777485132_0001_m_000001_2
> --> no files/object
>
> Ozone usage is increased in the recon UI as 75GB
>
> hdfs dfs -rm -r -skipTrash
> ofs://ozonefrankserviceid/testvol/testbucket/teragentest1
> ozone sh bucket delete o3://ozonefrankserviceid/testvol/testbucket
>
> [root@DNHOST1 ozone-conf]# grep -A1 'hdds.datanode.dir' ozone-site.xml
> <name>hdds.datanode.dir</name>
> <value>/var/lib/hadoop-ozone/datanode/data</value>
> [root@DNHOST1 ozone-conf]#[root@DNHOST1 containerDir0]# du -sh
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> 26G
> /var/lib/hadoop-ozone/datanode/data/hdds/a9461a7f-ef81-4942-a278-15ff7602df14/current/containerDir0/
> [root@DNHOST1 containerDir0]#
> [root@DNHOST1 chunks]# ozone sh volume list o3://ozonefrankserviceid/ -a
> |egrep 'name|usedNamespace'
> "name" : "s3v",
> "usedNamespace" : 0,
> "name" : "om",
> "name" : "testvol",
> "usedNamespace" : 0,
> "name" : "hive/[email protected]",
> "name" : "hive",
> [root@DNHOST1 chunks]# {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]