Harunobu Daikoku created HADOOP-18793:
-----------------------------------------
Summary: S3A StagingCommitter does not clean up staging-uploads
directory
Key: HADOOP-18793
URL: https://issues.apache.org/jira/browse/HADOOP-18793
Project: Hadoop Common
Issue Type: Bug
Components: fs/s3
Affects Versions: 3.2.2
Reporter: Harunobu Daikoku
When setting up StagingCommitter and its internal FileOutputCommitter, a
temporary directory that holds MPU information will be created on the default
FS, which by default is to be
/user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads.
On a successful job commit, its child directory (_temporary) will be [cleaned
up|https://github.com/apache/hadoop/blob/a36d8adfd18e88f2752f4387ac4497aadd3a74e7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L516]
properly, but ${UUID}/staging-uploads will remain.
This will result in having too many empty ${UUID}/staging-uploads directories
under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue in
an environment where the max number of items in a directory is capped (e.g. by
dfs.namenode.fs-limits.max-directory-items in HDFS).
{noformat}
The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded:
limit=1048576 items=1048576
at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]