[jira] [Commented] (HADOOP-11570) S3AInputStream.close() downloads the remaining bytes of the object from S3

Steve Loughran (JIRA) Wed, 11 Feb 2015 10:17:50 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316694#comment-14316694
 ]


Steve Loughran commented on HADOOP-11570:
-----------------------------------------

OK. Would there be any benefit in having the choice of action move from {{pos 
== contentLength}} to ({{(contentLength -pos) <= threshold) with some small 
threshold like 1-4K. That way, it'd be cleaner to close files near the end of 
the stream. 

I don't have enough stats on single-JVM read operations to know if that has any 
benefit. Making forward seek() operations more efficient is more critical, as 
the general sequence of an analytics read of a column structured format (ORC) 
is:

# open blob
# seek to start of "block"/allocated subset of data
# read through with skipping of regions that don't contain columns or ranges of 
interest
# stop at the end of their allocated data subset.
# close the stream

This patch will address the close stream operation.

> S3AInputStream.close() downloads the remaining bytes of the object from S3
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-11570
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11570
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.6.0
>            Reporter: Dan Hecht
>         Attachments: HADOOP-11570-001.patch
>
>
> Currently, S3AInputStream.close() calls S3Object.close().  But, 
> S3Object.close() will read the remaining bytes of the S3 object, potentially 
> transferring a lot of bytes from S3 that are discarded.  Instead, the wrapped 
> stream should be aborted to avoid transferring discarded bytes (unless the 
> preceding read() finished at contentLength).  For example, reading only the 
> first byte of a 1 GB object and then closing the stream will result in all 1 
> GB transferred from S3.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HADOOP-11570) S3AInputStream.close() downloads the remaining bytes of the object from S3

Reply via email to