[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15848373#comment-15848373
 ] 

Steve Loughran commented on HADOOP-14028:
-----------------------------------------

Although we have that cleanup patch, I'm still worried about streams not being 
closed when sent up with the xfer manager. I think we should be explicitly 
closing them ourselves just to make sure.

I've tweaked the code so that the disk output stream logs a full stack trace at 
debug in close(), which makes it easier to spot in the logs, and shows where 
closure kicks in

single blob PUT
{code}
2017-02-01 13:28:51,451 [s3a-transfer-shared-pool1-t2] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:close(884)) - Block[1] closing stream of buffer file 
target/build/test/s3a/s3ablock-0001-8454965610295305282.tmp
2017-02-01 13:28:51,452 [s3a-transfer-shared-pool1-t2] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:close(889)) - stack: 
java.io.IOException: Output stream closed
        at 
org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlock$DiskBlockInputStream.close(S3ADataBlocks.java:888)
        at 
com.amazonaws.internal.ReleasableInputStream.doRelease(ReleasableInputStream.java:84)
        at 
com.amazonaws.internal.ReleasableInputStream.close(ReleasableInputStream.java:68)
        at 
com.amazonaws.internal.SdkFilterInputStream.close(SdkFilterInputStream.java:89)
        at 
com.amazonaws.internal.SdkFilterInputStream.close(SdkFilterInputStream.java:89)
        at java.io.BufferedInputStream.close(BufferedInputStream.java:483)
        at 
com.amazonaws.internal.SdkBufferedInputStream.close(SdkBufferedInputStream.java:93)
        at 
com.amazonaws.internal.SdkFilterInputStream.close(SdkFilterInputStream.java:89)
        at 
com.amazonaws.event.ProgressInputStream.close(ProgressInputStream.java:182)
        at com.amazonaws.util.IOUtils.closeQuietly(IOUtils.java:71)
        at 
com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:322)
        at 
com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
        at 
com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1472)
        at 
org.apache.hadoop.fs.s3a.S3AFileSystem.putObjectDirect(S3AFileSystem.java:1090)
        at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$1.call(S3ABlockOutputStream.java:397)
        at 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream$1.call(S3ABlockOutputStream.java:392)
        at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
        at 
org.apache.hadoop.fs.s3a.SemaphoredDelegatingExecutor$CallableWithPermitRelease.call(SemaphoredDelegatingExecutor.java:222)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
2017-02-01 13:28:51,455 [s3a-transfer-shared-pool1-t2] DEBUG 
s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:call(399)) - Closing block 
{} after put operation
2017-02-01 13:28:51,455 [s3a-transfer-shared-pool1-t2] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:enterState(167)) - FileBlock{index=1, 
destFile=target/build/test/s3a/s3ablock-0001-8454965610295305282.tmp, 
state=Upload, dataSize=16, limit=5242880}: entering state Closed
{code}

There is no stack trace in the multipart upload via the transfer manager, 
anywhere in the test run. It's streams are not being closed.

> S3A block output streams don't clear temporary files
> ----------------------------------------------------
>
>                 Key: HADOOP-14028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14028
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2
>         Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>            Reporter: Seth Fitzsimmons
>            Assignee: Steve Loughran
>         Attachments: HADOOP-14028-branch-2-001.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to