[ 
https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340267#comment-15340267
 ] 

Steve Loughran commented on HADOOP-13203:
-----------------------------------------

Here is the comparison of a small sequence of forward/backward read operations, 
between the random and sequential policies

"random" keeps buffer sizes in requests down to a minimum, hence no bytes 
wasted in close or aborts {{BytesReadInClose=0, BytesDiscardedInAbort=0}}

"sequential" expects a read from 0-len, requests the entire range. Forward 
seeks can be skipped, backward seeks trigger retreat. However, the default 
block size (32K) is too low for any forward skip (should we change this to 
something like 128K?), then there's an abort, leading to the values 
{{BytesReadInClose=0, BytesDiscardedInAbort=80771308}}. Those abort bytes never 
get read in, but they do measure how oversized the request was

{code}
testRandomIO_RandomPolicy: Random IO with policy "random"

2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: random
2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:open(617)) - Opening 's3a://landsat-pds/scene_list.gz' for 
reading.
2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AStorageStatistics.java:incrementCounter(60)) - invocations_getfilestatus += 
1  ->  2
2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1421)) - Getting path status for 
s3a://landsat-pds/scene_list.gz  (scene_list.gz)
2016-06-20 20:35:15,681 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests += 
1  ->  2
2016-06-20 20:35:15,846 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1432)) - Found exact file: normal file
2016-06-20 20:35:15,848 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal
2016-06-20 20:35:15,850 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=2097152, length=131072, 
requestedStreamLen=2228224, streamPosition=0, nextReadPosition=2097152
2016-06-20 20:35:16,069 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
closed: seekInStream(); streamPos=2228224, nextReadPos=131072,request range 
2097152-2228224 length=2228224
2016-06-20 20:35:16,069 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=131072, length=131072, 
requestedStreamLen=262144, streamPosition=131072, nextReadPosition=131072
2016-06-20 20:35:16,259 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
closed: seekInStream(); streamPos=262144, nextReadPos=5242880,request range 
131072-262144 length=262144
2016-06-20 20:35:16,259 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=5242880, length=65536, 
requestedStreamLen=5308416, streamPosition=5242880, nextReadPosition=5242880
2016-06-20 20:35:16,437 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
closed: seekInStream(); streamPos=5308416, nextReadPos=1048576,request range 
5242880-5308416 length=5308416
2016-06-20 20:35:16,437 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=1048576, length=1048576, 
requestedStreamLen=2097152, streamPosition=1048576, nextReadPosition=1048576
2016-06-20 20:35:16,994 [Thread-0] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1262)) - Duration of Time to execute 4 reads of 
total size 1376256 bytes: 1,141,400,611 nS
2016-06-20 20:35:16,994 [Thread-0] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
closed: close() operation; streamPos=2097152, nextReadPos=0,request range 
1048576-2097152 length=2097152
2016-06-20 20:35:16,995 [Thread-0] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logTimePerIOP(165)) - Time per byte read: 
829 nS
2016-06-20 20:35:16,996 [Thread-0] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:executeRandomIO(388)) - Effective bandwidth 
1.205761 MB/S
2016-06-20 20:35:16,997 [Thread-0] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logStreamStatistics(292)) - Stream 
Statistics
StreamStatistics{OpenOperations=4, CloseOperations=4, Closed=4, Aborted=0, 
SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, 
BackwardSeekOperations=2, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, 
BytesRead=1376256, BytesRead excluding skipped=1376256, ReadOperations=164, 
ReadFullyOperations=4, ReadsIncomplete=160, SeekRangeOvershot=0, 
BytesReadInClose=0, BytesDiscardedInAbort=0}


testRandomIO_NormalPolicy: Random IO with policy "normal"

2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal
2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:open(617)) - Opening 's3a://landsat-pds/scene_list.gz' for 
reading.
2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AStorageStatistics.java:incrementCounter(60)) - invocations_getfilestatus += 
1  ->  5
2016-06-20 20:35:20,434 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1421)) - Getting path status for 
s3a://landsat-pds/scene_list.gz  (scene_list.gz)
2016-06-20 20:35:20,434 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests += 
1  ->  6
2016-06-20 20:35:20,597 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:getFileStatus(1432)) - Found exact file: normal file
2016-06-20 20:35:20,597 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal
2016-06-20 20:35:20,598 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=2097152, length=131072, 
requestedStreamLen=22666811, streamPosition=0, nextReadPosition=2097152
2016-06-20 20:35:20,792 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
aborted: seekInStream(); streamPos=2228224, nextReadPos=131072,request range 
2097152-22666811 length=22666811
2016-06-20 20:35:20,792 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=131072, length=131072, 
requestedStreamLen=22666811, streamPosition=131072, nextReadPosition=131072
2016-06-20 20:35:22,117 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
aborted: seekInStream(); streamPos=262144, nextReadPos=5242880,request range 
131072-22666811 length=22666811
2016-06-20 20:35:22,117 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=5242880, length=65536, 
requestedStreamLen=22666811, streamPosition=5242880, nextReadPosition=5242880
2016-06-20 20:35:23,302 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
aborted: seekInStream(); streamPos=5308416, nextReadPos=1048576,request range 
5242880-22666811 length=22666811
2016-06-20 20:35:23,302 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for 
read from new offset at targetPos=1048576, length=1048576, 
requestedStreamLen=22666811, streamPosition=1048576, nextReadPosition=1048576
2016-06-20 20:35:25,240 [Thread-5] INFO  contract.ContractTestUtils 
(ContractTestUtils.java:end(1262)) - Duration of Time to execute 4 reads of 
total size 1376256 bytes: 4,641,800,492 nS
2016-06-20 20:35:25,240 [Thread-5] DEBUG s3a.S3AFileSystem 
(S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz 
aborted: close() operation; streamPos=2097152, nextReadPos=0,request range 
1048576-22666811 length=22666811
2016-06-20 20:35:25,241 [Thread-5] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logTimePerIOP(165)) - Time per byte read: 
3,372 nS
2016-06-20 20:35:25,241 [Thread-5] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:executeRandomIO(388)) - Effective bandwidth 
0.296492 MB/S
2016-06-20 20:35:25,241 [Thread-5] INFO  scale.TestS3AInputStreamPerformance 
(TestS3AInputStreamPerformance.java:logStreamStatistics(292)) - Stream 
Statistics
StreamStatistics{OpenOperations=4, CloseOperations=4, Closed=0, Aborted=4, 
SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, 
BackwardSeekOperations=2, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, 
BytesRead=1376256, BytesRead excluding skipped=1376256, ReadOperations=158, 
ReadFullyOperations=4, ReadsIncomplete=154, SeekRangeOvershot=0, 
BytesReadInClose=0, BytesDiscardedInAbort=80771308}
2016-06-20 20:35:25,241 [Thread-5] INFO  scale.S3AScaleTestBase 
(S3AScaleTestBase.java:describe(155)) - 

{code}

> S3a: Consider reducing the number of connection aborts by setting correct 
> length in s3 request
> ----------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13203
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13203
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: HADOOP-13203-branch-2-001.patch, 
> HADOOP-13203-branch-2-002.patch, HADOOP-13203-branch-2-003.patch, 
> HADOOP-13203-branch-2-004.patch, HADOOP-13203-branch-2-005.patch, 
> HADOOP-13203-branch-2-006.patch, HADOOP-13203-branch-2-007.patch, 
> stream_stats.tar.gz
>
>
> Currently file's "contentLength" is set as the "requestedStreamLen", when 
> invoking S3AInputStream::reopen().  As a part of lazySeek(), sometimes the 
> stream had to be closed and reopened. But lots of times the stream was closed 
> with abort() causing the internal http connection to be unusable. This incurs 
> lots of connection establishment cost in some jobs.  It would be good to set 
> the correct value for the stream length to avoid connection aborts. 
> I will post the patch once aws tests passes in my machine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to