[
https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314915#comment-15314915
]
Chris Nauroth commented on HADOOP-13203:
----------------------------------------
# The comment "In case this is set to contentLength, expect lots of connection
closes with abort..." is not entirely accurate. I see how this is true for
usage that seeks backward, but it's not true for usage that seeks forward a
lot, as demonstrated during the HADOOP-13028 review. (More on this topic
below.)
# Would you please revert the change in {{S3AInputStream#setReadahead}}? This
is a public API, and the contract of that API is defined in interface
{{CanSetReadahead}}. It states that callers are allowed to pass {{null}} to
reset the read-ahead to its default value. This matches the behavior
implemented by HDFS. The logic currently in S3A implements it correctly, but
with this patch applied, it would cause a {{NullPointerException}} if a caller
passed {{null}}.
# In {{TestS3AInputStreamPerformance}}, I see why these changes were required
to make the tests pass, but it highlights that this change partly reverts what
was achieved in HADOOP-13028 to minimize reopens on forward seeks. Before this
patch, {{testReadAheadDefault}} generated 1 open. After applying the patch, I
see it generating 343 opens. It seems we can't fully optimize forward seek
without harming backwards seek due to the unintended aborts. I suppose one
option would be to introduce an optional advice API, similar to calling
{{fadvise(FADV_SEQUENTIAL)}} that forward-seeking applications could call.
That would be a much bigger change though. I don't see a way to achieve
anything better right now, although it's probably good that you changed
{{closeStream}} to consider read-ahead instead of the old {{CLOSE_THRESHOLD}}
to determine whether or not to abort. Steve, do you have any further thoughts
on this?
> S3a: Consider reducing the number of connection aborts by setting correct
> length in s3 request
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-13203
> URL: https://issues.apache.org/jira/browse/HADOOP-13203
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Priority: Minor
> Attachments: HADOOP-13203-branch-2-001.patch,
> HADOOP-13203-branch-2-002.patch
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when
> invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the
> stream had to be closed and reopened. But lots of times the stream was closed
> with abort() causing the internal http connection to be unusable. This incurs
> lots of connection establishment cost in some jobs. It would be good to set
> the correct value for the stream length to avoid connection aborts.
> I will post the patch once aws tests passes in my machine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]