[
https://issues.apache.org/jira/browse/HADOOP-17386?focusedWorklogId=711953&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-711953
]
ASF GitHub Bot logged work on HADOOP-17386:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 20/Jan/22 10:19
Start Date: 20/Jan/22 10:19
Worklog Time Spent: 10m
Work Description: monthonk opened a new pull request #3908:
URL: https://github.com/apache/hadoop/pull/3908
### Description of PR
fs.s3a.buffer.dir defaults to hadoop.tmp.dir which is /tmp or similar. A lot
of systems don't clean up /tmp until reboot -and if they stay up for a long
time then they accrue files written through s3a staging committer from spark
containers which fail.
Fix: use ${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a as the option so that if
env.LOCAL_DIRS is set is used over hadoop.tmp.dir. YARN-deployed apps will use
that for the buffer dir. When the app container is destroyed, so is the
directory.
### How was this patch tested?
Injected LOCAL_DIRS env and verified that it was picked up by S3A. Also when
it is not set, verified that hadoop.tmp.dir would be used as a fallback.
### For code changes:
- [x] Does the title or this PR starts with the corresponding JIRA issue id
(e.g. 'HADOOP-17799. Your PR title ...')?
- [x] Object storage: have the integration tests been executed and the
endpoint declared according to the connector-specific documentation?
- [ ] If adding new dependencies to the code, are these dependencies
licensed in a way that is compatible for inclusion under [ASF
2.0](http://www.apache.org/legal/resolved.html#category-a)?
- [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`,
`NOTICE-binary` files?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 711953)
Remaining Estimate: 0h
Time Spent: 10m
> fs.s3a.buffer.dir to be under Yarn container path on yarn applications
> ----------------------------------------------------------------------
>
> Key: HADOOP-17386
> URL: https://issues.apache.org/jira/browse/HADOOP-17386
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.0
> Reporter: Steve Loughran
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> # fs.s3a.buffer.dir defaults to hadoop.tmp.dir which is /tmp or similar
> # we use this for storing file blocks during upload
> # staging committers use it for all files in a task, which can be a lot more
> # a lot of systems don't clean up /tmp until reboot -and if they stay up for
> a long time then they accrue files written through s3a staging committer from
> spark containers which fail
> Fix: use ${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a as the option so that if
> env.LOCAL_DIRS is set is used over hadoop.tmp.dir. YARN-deployed apps will
> use that for the buffer dir. When the app container is destroyed, so is the
> directory.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]