[
https://issues.apache.org/jira/browse/HADOOP-18793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741095#comment-17741095
]
ASF GitHub Bot commented on HADOOP-18793:
-----------------------------------------
hdaikoku opened a new pull request, #5818:
URL: https://github.com/apache/hadoop/pull/5818
### Description of PR
This PR fixes a bug in StagingCommitter, which leaks the staging uploads
directory, by deleting the directory in `StagingCommitter#cleanup()`.
### How was this patch tested?
Ran the integration tests in
`org.apache.hadoop.fs.s3a.commit.staging.integration` against a LocalStack
instance running on my laptop.
```xml
<property>
<name>fs.s3a.endpoint</name>
<value>s3.ap-south-1.localhost.localstack.cloud:4566</value>
</property>
<property>
<name>fs.s3a.connection.ssl.enabled</name>
<value>false</value>
</property>
```
```
[INFO] -------------------------------------------------------
[INFO] T E S T S
[INFO] -------------------------------------------------------
[INFO] Running
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
[INFO] Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
33.196 s - in
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
[INFO] Running
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
30.62 s - in
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
[INFO] Running
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
[WARNING] Tests run: 24, Failures: 0, Errors: 0, Skipped: 1, Time elapsed:
26.316 s - in
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
[INFO] Running
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.166
s - in
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 74, Failures: 0, Errors: 0, Skipped: 1
```
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [x] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [x] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [x] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
> S3A StagingCommitter does not clean up staging-uploads directory
> ----------------------------------------------------------------
>
> Key: HADOOP-18793
> URL: https://issues.apache.org/jira/browse/HADOOP-18793
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.2.2
> Reporter: Harunobu Daikoku
> Priority: Minor
>
> When setting up StagingCommitter and its internal FileOutputCommitter, a
> temporary directory that holds MPU information will be created on the default
> FS, which by default is to be
> /user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads.
> On a successful job commit, its child directory (_temporary) will be [cleaned
> up|https://github.com/apache/hadoop/blob/a36d8adfd18e88f2752f4387ac4497aadd3a74e7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L516]
> properly, but ${UUID}/staging-uploads will remain.
> This will result in having too many empty ${UUID}/staging-uploads directories
> under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue
> in an environment where the max number of items in a directory is capped
> (e.g. by dfs.namenode.fs-limits.max-directory-items in HDFS).
> {noformat}
> The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded:
> limit=1048576 items=1048576
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]