[
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15840605#comment-15840605
]
Seth Fitzsimmons commented on HADOOP-14028:
-------------------------------------------
Reading through the AWS SDK code, it looks like this is the line ultimately
responsible for closing the input stream:
https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-core/src/main/java/com/amazonaws/internal/ReleasableInputStream.java#L85
I'm using the default settings (other than {{fs.s3a.fast.upload=true}}.
>From watching the atimes, it looks like there's only 1 block going up at a
>time while the next one fills up. (My producer is relatively slow and I'm
>running in EC2, so it makes sense that the uploader can keep up).
> S3A block output streams don't clear temporary files
> ----------------------------------------------------
>
> Key: HADOOP-14028
> URL: https://issues.apache.org/jira/browse/HADOOP-14028
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.0.0-alpha2
> Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
> Reporter: Seth Fitzsimmons
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I
> was looking for after running into the same OOM problems) and don't see it
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process
> starts. My expectation is that local copies of the blocks would be deleted
> after those parts finish uploading, but I'm seeing more than 15 blocks in
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed
> after individual blocks have finished uploading or when the entire file has
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files,
> sorting by atime, and deleting anything older than the first 20: `ls -ut |
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call
> > that the AWS httpclient makes on the input stream triggers the deletion.
> > Though there aren't tests for it, as I recall.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]