[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870560#comment-15870560
 ] 

Steve Loughran edited comment on HADOOP-14028 at 2/16/17 7:27 PM:
------------------------------------------------------------------

patch 006 pass file down to the put request, simplify blockUploadData and tune 
tests,

I've been into the AWS SDK now, as well as testing what's happening in the 
java.io code

# we MUST pass in the File instance for reliable uploads of file data.
# cleanup must therefore always be in the block close() call.

tested: s3 ireland, with scale tests at 128M


was (Author: ste...@apache.org):
patch 006 pass file down to the put request, simplify blockUploadData and tune 
tests,

I've been into the AWS SDK now, as well as testing what's happening in the 
java.io code

# we MUST pass in the File instance for reliable uploads of file data.
# cleanup must therefore always be in the block close() call.

tested: s3 ireland, with scale tests

> S3A block output streams don't delete temporary files in multipart uploads
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-14028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14028
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.0
>         Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>            Reporter: Seth Fitzsimmons
>            Assignee: Steve Loughran
>            Priority: Critical
>         Attachments: HADOOP-14028-006.patch, HADOOP-14028-branch-2-001.patch, 
> HADOOP-14028-branch-2.8-002.patch, HADOOP-14028-branch-2.8-003.patch, 
> HADOOP-14028-branch-2.8-004.patch, HADOOP-14028-branch-2.8-005.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to