[ 
https://issues.apache.org/jira/browse/HADOOP-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847272#comment-15847272
 ] 

Steve Loughran commented on HADOOP-14028:
-----------------------------------------

BTW, on branch-2.8 the logs of the test also showed that on the multipart 
upload there, the stream *was not* being saved. That is: the problem exists on 
that branch-2. Because the temp files were all allocated on deleteOnExit(), 
clean process exits will clean up files anyway

{code}
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG 
s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:call(501)) - Completed 
upload of FileBlock{index=2, 
destFile=target/build/test/s3a/s3ablock-0002-4995600247628610940.tmp, 
state=Upload, dataSize=2097152, limit=8388608} to part 
db1f7d786f6e0317456fac1628349973
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG 
s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:call(503)) - Stream 
statistics of OutputStreamStatistics{blocksSubmitted=2, blocksInQueue=0, 
blocksActive=1, blockUploadsCompleted=1, blockUploadsFailed=0, 
bytesPendingUpload=5734400, bytesUploaded=4751360, blocksAllocated=2, 
blocksReleased=0, blocksActivelyAllocated=2, exceptionsInMultipartFinalize=0, 
transferDuration=3028 ms, queueDuration=5 ms, averageQueueTime=2 ms, 
totalUploadDuration=3033 ms, effectiveBandwidth=1566554.566435872 bytes/s}
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG 
s3a.S3ABlockOutputStream (S3ABlockOutputStream.java:call(506)) - Closing block 
at end of multipart PUT FileBlock{index=2, 
destFile=target/build/test/s3a/s3ablock-0002-4995600247628610940.tmp, 
state=Upload, dataSize=2097152, limit=8388608}
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:enterState(167)) - FileBlock{index=2, 
destFile=target/build/test/s3a/s3ablock-0002-4995600247628610940.tmp, 
state=Upload, dataSize=2097152, limit=8388608}: entering state Closed
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:close(282)) - Closed FileBlock{index=2, 
destFile=target/build/test/s3a/s3ablock-0002-4995600247628610940.tmp, 
state=Closed, dataSize=2097152, limit=8388608}
2017-01-31 18:33:43,148 [s3a-transfer-shared-pool1-t4] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:innerClose(805)) - Closing FileBlock{index=2, 
destFile=target/build/test/s3a/s3ablock-0002-4995600247628610940.tmp, 
state=Closed, dataSize=2097152, limit=8388608}
2017-01-31 18:33:43,149 [s3a-transfer-shared-pool1-t4] DEBUG s3a.S3ADataBlocks 
(S3ADataBlocks.java:closeBlock(857)) - block[2]: closeBlock()
{code}

> S3A block output streams don't clear temporary files
> ----------------------------------------------------
>
>                 Key: HADOOP-14028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14028
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.0.0-alpha2
>         Environment: JDK 8 + ORC 1.3.0 + hadoop-aws 3.0.0-alpha2
>            Reporter: Seth Fitzsimmons
>            Assignee: Steve Loughran
>         Attachments: HADOOP-14028-branch-2-001.patch
>
>
> I have `fs.s3a.fast.upload` enabled with 3.0.0-alpha2 (it's exactly what I 
> was looking for after running into the same OOM problems) and don't see it 
> cleaning up the disk-cached blocks.
> I'm generating a ~50GB file on an instance with ~6GB free when the process 
> starts. My expectation is that local copies of the blocks would be deleted 
> after those parts finish uploading, but I'm seeing more than 15 blocks in 
> /tmp (and none of them have been deleted thus far).
> I see that DiskBlock deletes temporary files when closed, but is it closed 
> after individual blocks have finished uploading or when the entire file has 
> been fully written to the FS (full upload completed, including all parts)?
> As a temporary workaround to avoid running out of space, I'm listing files, 
> sorting by atime, and deleting anything older than the first 20: `ls -ut | 
> tail -n +21 | xargs rm`
> Steve Loughran says:
> > They should be deleted as soon as the upload completes; the close() call 
> > that the AWS httpclient makes on the input stream triggers the deletion. 
> > Though there aren't tests for it, as I recall.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to