[
https://issues.apache.org/jira/browse/HADOOP-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254168#comment-15254168
]
Steve Loughran commented on HADOOP-13047:
-----------------------------------------
regarding the patch, I don't think we need to go to anything trying to be
adaptive to bandwidth., at least not initially. Having something you can
preconfigure should be enough at first.
Why? For short lived streams, you aren't going to have any statistics....yet
you may know across applications and instances of the app whether you are
near/far from s3, so can choose some values and see what works
> S3a Forward seek in stream length to be configurable
> ----------------------------------------------------
>
> Key: HADOOP-13047
> URL: https://issues.apache.org/jira/browse/HADOOP-13047
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Attachments: HADOOP-13047.WIP.patch
>
>
> Even with lazy seek, tests can show that sometimes a short-distance forward
> seek is triggering a close + reopen, because the threshold for the seek is
> simply available bytes in the inner stream.
> A configurable threshold would allow data to be read and discarded before
> that seek. This should be beneficial over long-haul networks as the time to
> set up the TCP channel is high, and TCP-slow-start means that the ramp up of
> bandwidth is slow. In such deployments, it will better to read forward than
> re-open, though the exact "best" number will vary with client and endpoint.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)