[ 
https://issues.apache.org/jira/browse/HADOOP-18793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741095#comment-17741095
 ] 

ASF GitHub Bot commented on HADOOP-18793:
-----------------------------------------

hdaikoku opened a new pull request, #5818:
URL: https://github.com/apache/hadoop/pull/5818

   ### Description of PR
   
   This PR fixes a bug in StagingCommitter, which leaks the staging uploads 
directory, by deleting the directory in `StagingCommitter#cleanup()`.
   
   ### How was this patch tested?
   Ran the integration tests in 
`org.apache.hadoop.fs.s3a.commit.staging.integration` against a LocalStack 
instance running on my laptop.
   
   ```xml
       <property>
           <name>fs.s3a.endpoint</name>
           <value>s3.ap-south-1.localhost.localstack.cloud:4566</value>
       </property>
   
       <property>
           <name>fs.s3a.connection.ssl.enabled</name>
           <value>false</value>
       </property>
   ```
   
   ```
   [INFO] -------------------------------------------------------
   [INFO]  T E S T S
   [INFO] -------------------------------------------------------
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
   [INFO] Tests run: 24, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.196 s - in 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
   [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
30.62 s - in 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestDirectoryCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
   [WARNING] Tests run: 24, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
26.316 s - in 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestPartitionedCommitProtocol
   [INFO] Running 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
   [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.166 
s - in 
org.apache.hadoop.fs.s3a.commit.staging.integration.ITestStagingCommitProtocolFailure
   [INFO] 
   [INFO] Results:
   [INFO] 
   [WARNING] Tests run: 74, Failures: 0, Errors: 0, Skipped: 1
   ```
   
   ### For code changes:
   
   - [x] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [x] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [x] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [x] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> S3A StagingCommitter does not clean up staging-uploads directory
> ----------------------------------------------------------------
>
>                 Key: HADOOP-18793
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18793
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.2
>            Reporter: Harunobu Daikoku
>            Priority: Minor
>
> When setting up StagingCommitter and its internal FileOutputCommitter, a 
> temporary directory that holds MPU information will be created on the default 
> FS, which by default is to be 
> /user/${USER}/tmp/staging/${USER}/${UUID}/staging-uploads.
> On a successful job commit, its child directory (_temporary) will be [cleaned 
> up|https://github.com/apache/hadoop/blob/a36d8adfd18e88f2752f4387ac4497aadd3a74e7/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/commit/staging/StagingCommitter.java#L516]
>  properly, but ${UUID}/staging-uploads will remain.
> This will result in having too many empty ${UUID}/staging-uploads directories 
> under /user/${USER}/tmp/staging/${USER}, and will eventually cause an issue 
> in an environment where the max number of items in a directory is capped 
> (e.g. by dfs.namenode.fs-limits.max-directory-items in HDFS).
> {noformat}
> The directory item limit of /user/${USER}/tmp/staging/${USER} is exceeded: 
> limit=1048576 items=1048576
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxDirItems(FSDirectory.java:1205)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to