[
https://issues.apache.org/jira/browse/HADOOP-18793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740675#comment-17740675
]
Harunobu Daikoku commented on HADOOP-18793:
-------------------------------------------
Hi [[email protected]], as I checked, workPath here seems to be pointing to a
local fs path under fs.s3a.buffer.dir.
{code:none}
23/07/06 15:54:56 DEBUG AbstractS3ACommitter: Setting work path to
file:/ssd/yarn/nm/usercache/${USER}/appcache/application_1687403148782_2731269/s3a/0e9bd28b-296c-4d6e-bff1-3f7049bd7ecd/_temporary/0/_temporary/attempt_202307061554568951051086712533391_0000_m_000000_0
23/07/06 15:54:56 INFO AbstractS3ACommitterFactory: Using Commmitter
StagingCommitter{AbstractS3ACommitter{role=Task committer
attempt_202307061554568951051086712533391_0000_m_000000_0, name=directory,
outputPath=s3a://${USER}/checkpoints/user/${USER}/7663e766-9630-434c-aa7d-a8f429b98dff/e67f2bbf-3850-4f63-9b4b-b722a5fbd81f,
workPath=file:/ssd/yarn/nm/usercache/${USER}/appcache/application_1687403148782_2731269/s3a/0e9bd28b-296c-4d6e-bff1-3f7049bd7ecd/_temporary/0/_temporary/attempt_202307061554568951051086712533391_0000_m_000000_0},
conflictResolution=REPLACE,
wrappedCommitter=FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl{jobId=job_202307061554568951051086712533391_0000};
taskId=attempt_202307061554568951051086712533391_0000_m_000000_0, status=''};
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter@11ac570b};
outputPath=hdfs://nameservice1/user/${USER}/tmp/staging/${USER}/0e9bd28b-296c-4d6e-bff1-3f7049bd7ecd/staging-uploads,
workPath=null, algorithmVersion=1, skipCleanup=false,
ignoreCleanupFailures=true}} for
s3a://${USER}/checkpoints/user/${USER}/7663e766-9630-434c-aa7d-a8f429b98dff/e67f2bbf-3850-4f63-9b4b-b722a5fbd81f
23/07/06 15:54:56 DEBUG StagingCommitter: Cleaning up work path
file:/ssd/yarn/nm/usercache/${USER}/appcache/application_1687403148782_2731269/s3a/0e9bd28b-296c-4d6e-bff1-3f7049bd7ecd/_temporary/0/_temporary/attempt_202307061554568951051086712533391_0000_m_000000_0
23/07/06 15:54:56 INFO AbstractS3ACommitter: Task committer
attempt_202307061554568951051086712533391_0000_m_000000_0: commitJob((no job
ID)): duration 0:00.430s
{code}
> S3A StagingCommitter does not clean up staging-uploads directory
> ----------------------------------------------------------------
>
> Key: HADOOP-18793
> URL: https://issues.apache.org/jira/browse/HADOOP-18793
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.2
> Reporter: Harunobu Daikoku
> Priority: Minor
>
> When setting up StagingCommitter and its internal FileOutputCommitter, a
> temporary directory that holds MPU information will be created on the default
> FS, which by default is to be
> /user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads.
> On a successful job commit, its child directory (_temporary) will be [cleaned
> up|https://github.com/apache/hadoop/blob/a36d8adfd18e88f2752f4387ac4497aadd3a74e7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L516]
> properly, but ${UUID}/staging-uploads will remain.
> This will result in having too many empty ${UUID}/staging-uploads directories
> under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue
> in an environment where the max number of items in a directory is capped
> (e.g. by dfs.namenode.fs-limits.max-directory-items in HDFS).
> {noformat}
> The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded:
> limit=1048576 items=1048576
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]