[
https://issues.apache.org/jira/browse/HADOOP-11570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14316416#comment-14316416
]
Dan Hecht commented on HADOOP-11570:
------------------------------------
Correct, the seek case already uses abort(). Additionally, the
S3ObjectInputStream.abort() documentation makes it clear that this is the
expected tradeoff between abort() and close():
{code}
/**
* {@inheritDoc}
*
* Aborts the underlying http request without reading any more data and
* closes the stream.
* <p>
* By default Apache {@link HttpClient} tries to reuse http connections by
* reading to the end of an attached input stream on
* {@link InputStream#close()}. This is efficient from a socket pool
* management perspective, but for objects with large payloads can incur
* significant overhead while bytes are read from s3 and discarded. It's up
* to clients to decide when to take the performance hit implicit in not
* reusing an http connection in order to not read unnecessary information
* from S3.
*
* @see EofSensorInputStream
*/
{code}
> S3AInputStream.close() downloads the remaining bytes of the object from S3
> --------------------------------------------------------------------------
>
> Key: HADOOP-11570
> URL: https://issues.apache.org/jira/browse/HADOOP-11570
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.6.0
> Reporter: Dan Hecht
> Attachments: HADOOP-11570-001.patch
>
>
> Currently, S3AInputStream.close() calls S3Object.close(). But,
> S3Object.close() will read the remaining bytes of the S3 object, potentially
> transferring a lot of bytes from S3 that are discarded. Instead, the wrapped
> stream should be aborted to avoid transferring discarded bytes (unless the
> preceding read() finished at contentLength). For example, reading only the
> first byte of a 1 GB object and then closing the stream will result in all 1
> GB transferred from S3.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)