Seth Fitzsimmons created HADOOP-14028:
-----------------------------------------

             Summary: S3A block output streams don't clear temporary files
                 Key: HADOOP-14028
                 URL: https://issues.apache.org/jira/browse/HADOOP-14028
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/s3
    Affects Versions: 3.0.0-alpha2
         Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
            Reporter: Seth Fitzsimmons


I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I was 
looking for after running into the same OOM problems) and don't see it cleaning 
up the disk-cached blocks.

I'm generating a ~50GB file on an instance with ~6GB free when the process 
starts. My expectation is that local copies of the blocks would be deleted 
after those parts finish uploading, but I'm seeing more than 15 blocks in /tmp 
(and none of them have been deleted thus far).

I see that DiskBlock deletes temporary files when closed, but is it closed 
after individual blocks have finished uploading or when the entire file has 
been fully written to the FS (full upload completed, including all parts)?

As a temporary workaround to avoid running out of space, I'm listing files, 
sorting by atime, and deleting anything older than the first 20: `ls -ut | tail 
-n +21 | xargs rm`

Steve Loughran says:

> They should be deleted as soon as the upload completes; the close() call that 
> the AWS httpclient makes on the input stream triggers the deletion. Though 
> there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to