[
https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-13203:
------------------------------------
Attachment: HADOOP-13203-branch-2-008.patch
Patch 008; tested against s3 ireland.
This revision has the test to demonstrate what I suspected: reads spanning
block boundaries were going to have problems —and it has the fix. Which
consists of always calling {{seekInStream(pos, len)}} before a read, even if
{{targetPos==currentPos}} —and in that situation, closing the current stream if
the currentPos is at the end of the current request range (i.e. there's no
seek, but no data either). The test does block-spanning reads, on a file built
up with the byte at each position being {{(position % 64)}} ... this is used in
the tests to verify the bytes returned really are the bytes in the file at the
specific read positions.
BTW, note that some of the -Len fields in the input stream now refer to range
start and finish; Len isn't appropriate now the range of the HTTP request may
be less than the length of the actual blob. It was getting confusing.
> S3a: Consider reducing the number of connection aborts by setting correct
> length in s3 request
> ----------------------------------------------------------------------------------------------
>
> Key: HADOOP-13203
> URL: https://issues.apache.org/jira/browse/HADOOP-13203
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 2.8.0
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: HADOOP-13203-branch-2-001.patch,
> HADOOP-13203-branch-2-002.patch, HADOOP-13203-branch-2-003.patch,
> HADOOP-13203-branch-2-004.patch, HADOOP-13203-branch-2-005.patch,
> HADOOP-13203-branch-2-006.patch, HADOOP-13203-branch-2-007.patch,
> HADOOP-13203-branch-2-008.patch, stream_stats.tar.gz
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when
> invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the
> stream had to be closed and reopened. But lots of times the stream was closed
> with abort() causing the internal http connection to be unusable. This incurs
> lots of connection establishment cost in some jobs. It would be good to set
> the correct value for the stream length to avoid connection aborts.
> I will post the patch once aws tests passes in my machine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]