[ 
https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314915#comment-15314915
 ] 

Chris Nauroth commented on HADOOP-13203:
----------------------------------------

# The comment "In case this is set to contentLength, expect lots of connection 
closes with abort..." is not entirely accurate.  I see how this is true for 
usage that seeks backward, but it's not true for usage that seeks forward a 
lot, as demonstrated during the HADOOP-13028 review.  (More on this topic 
below.)
# Would you please revert the change in {{S3AInputStream#setReadahead}}?  This 
is a public API, and the contract of that API is defined in interface 
{{CanSetReadahead}}.  It states that callers are allowed to pass {{null}} to 
reset the read-ahead to its default value.  This matches the behavior 
implemented by HDFS.  The logic currently in S3A implements it correctly, but 
with this patch applied, it would cause a {{NullPointerException}} if a caller 
passed {{null}}.
# In {{TestS3AInputStreamPerformance}}, I see why these changes were required 
to make the tests pass, but it highlights that this change partly reverts what 
was achieved in HADOOP-13028 to minimize reopens on forward seeks.  Before this 
patch, {{testReadAheadDefault}} generated 1 open.  After applying the patch, I 
see it generating 343 opens.  It seems we can't fully optimize forward seek 
without harming backwards seek due to the unintended aborts.  I suppose one 
option would be to introduce an optional advice API, similar to calling 
{{fadvise(FADV_SEQUENTIAL)}} that forward-seeking applications could call.  
That would be a much bigger change though.  I don't see a way to achieve 
anything better right now, although it's probably good that you changed 
{{closeStream}} to consider read-ahead instead of the old {{CLOSE_THRESHOLD}} 
to determine whether or not to abort.  Steve, do you have any further thoughts 
on this?

> S3a: Consider reducing the number of connection aborts by setting correct 
> length in s3 request
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HADOOP-13203-branch-2-001.patch, 
> HADOOP-13203-branch-2-002.patch
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when 
> invoking S3AInputStream::reopen().  As a part of lazySeek(), sometimes the 
> stream had to be closed and reopened. But lots of times the stream was closed 
> with abort() causing the internal http connection to be unusable. This incurs 
> lots of connection establishment cost in some jobs.  It would be good to set 
> the correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to