[ https://issues.apache.org/jira/browse/HADOOP-13203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340267#comment-15340267 ]
Steve Loughran commented on HADOOP-13203: ----------------------------------------- Here is the comparison of a small sequence of forward/backward read operations, between the random and sequential policies "random" keeps buffer sizes in requests down to a minimum, hence no bytes wasted in close or aborts {{BytesReadInClose=0, BytesDiscardedInAbort=0}} "sequential" expects a read from 0-len, requests the entire range. Forward seeks can be skipped, backward seeks trigger retreat. However, the default block size (32K) is too low for any forward skip (should we change this to something like 128K?), then there's an abort, leading to the values {{BytesReadInClose=0, BytesDiscardedInAbort=80771308}}. Those abort bytes never get read in, but they do measure how oversized the request was {code} testRandomIO_RandomPolicy: Random IO with policy "random" 2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: random 2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:open(617)) - Opening 's3a://landsat-pds/scene_list.gz' for reading. 2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - invocations_getfilestatus += 1 -> 2 2016-06-20 20:35:15,680 [Thread-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:getFileStatus(1421)) - Getting path status for s3a://landsat-pds/scene_list.gz (scene_list.gz) 2016-06-20 20:35:15,681 [Thread-0] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests += 1 -> 2 2016-06-20 20:35:15,846 [Thread-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:getFileStatus(1432)) - Found exact file: normal file 2016-06-20 20:35:15,848 [Thread-0] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal 2016-06-20 20:35:15,850 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=2097152, length=131072, requestedStreamLen=2228224, streamPosition=0, nextReadPosition=2097152 2016-06-20 20:35:16,069 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz closed: seekInStream(); streamPos=2228224, nextReadPos=131072,request range 2097152-2228224 length=2228224 2016-06-20 20:35:16,069 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=131072, length=131072, requestedStreamLen=262144, streamPosition=131072, nextReadPosition=131072 2016-06-20 20:35:16,259 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz closed: seekInStream(); streamPos=262144, nextReadPos=5242880,request range 131072-262144 length=262144 2016-06-20 20:35:16,259 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=5242880, length=65536, requestedStreamLen=5308416, streamPosition=5242880, nextReadPosition=5242880 2016-06-20 20:35:16,437 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz closed: seekInStream(); streamPos=5308416, nextReadPos=1048576,request range 5242880-5308416 length=5308416 2016-06-20 20:35:16,437 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=1048576, length=1048576, requestedStreamLen=2097152, streamPosition=1048576, nextReadPosition=1048576 2016-06-20 20:35:16,994 [Thread-0] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1262)) - Duration of Time to execute 4 reads of total size 1376256 bytes: 1,141,400,611 nS 2016-06-20 20:35:16,994 [Thread-0] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz closed: close() operation; streamPos=2097152, nextReadPos=0,request range 1048576-2097152 length=2097152 2016-06-20 20:35:16,995 [Thread-0] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logTimePerIOP(165)) - Time per byte read: 829 nS 2016-06-20 20:35:16,996 [Thread-0] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:executeRandomIO(388)) - Effective bandwidth 1.205761 MB/S 2016-06-20 20:35:16,997 [Thread-0] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logStreamStatistics(292)) - Stream Statistics StreamStatistics{OpenOperations=4, CloseOperations=4, Closed=4, Aborted=0, SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=2, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, BytesRead=1376256, BytesRead excluding skipped=1376256, ReadOperations=164, ReadFullyOperations=4, ReadsIncomplete=160, SeekRangeOvershot=0, BytesReadInClose=0, BytesDiscardedInAbort=0} testRandomIO_NormalPolicy: Random IO with policy "normal" 2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal 2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:open(617)) - Opening 's3a://landsat-pds/scene_list.gz' for reading. 2016-06-20 20:35:20,433 [Thread-5] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - invocations_getfilestatus += 1 -> 5 2016-06-20 20:35:20,434 [Thread-5] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:getFileStatus(1421)) - Getting path status for s3a://landsat-pds/scene_list.gz (scene_list.gz) 2016-06-20 20:35:20,434 [Thread-5] DEBUG s3a.S3AFileSystem (S3AStorageStatistics.java:incrementCounter(60)) - object_metadata_requests += 1 -> 6 2016-06-20 20:35:20,597 [Thread-5] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:getFileStatus(1432)) - Found exact file: normal file 2016-06-20 20:35:20,597 [Thread-5] DEBUG s3a.S3AFileSystem (S3AFileSystem.java:setInputPolicy(566)) - Setting input strategy: normal 2016-06-20 20:35:20,598 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=2097152, length=131072, requestedStreamLen=22666811, streamPosition=0, nextReadPosition=2097152 2016-06-20 20:35:20,792 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); streamPos=2228224, nextReadPos=131072,request range 2097152-22666811 length=22666811 2016-06-20 20:35:20,792 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=131072, length=131072, requestedStreamLen=22666811, streamPosition=131072, nextReadPosition=131072 2016-06-20 20:35:22,117 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); streamPos=262144, nextReadPos=5242880,request range 131072-22666811 length=22666811 2016-06-20 20:35:22,117 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=5242880, length=65536, requestedStreamLen=22666811, streamPosition=5242880, nextReadPosition=5242880 2016-06-20 20:35:23,302 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz aborted: seekInStream(); streamPos=5308416, nextReadPos=1048576,request range 5242880-22666811 length=22666811 2016-06-20 20:35:23,302 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:reopen(145)) - reopen(s3a://landsat-pds/scene_list.gz) for read from new offset at targetPos=1048576, length=1048576, requestedStreamLen=22666811, streamPosition=1048576, nextReadPosition=1048576 2016-06-20 20:35:25,240 [Thread-5] INFO contract.ContractTestUtils (ContractTestUtils.java:end(1262)) - Duration of Time to execute 4 reads of total size 1376256 bytes: 4,641,800,492 nS 2016-06-20 20:35:25,240 [Thread-5] DEBUG s3a.S3AFileSystem (S3AInputStream.java:closeStream(470)) - Stream s3a://landsat-pds/scene_list.gz aborted: close() operation; streamPos=2097152, nextReadPos=0,request range 1048576-22666811 length=22666811 2016-06-20 20:35:25,241 [Thread-5] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logTimePerIOP(165)) - Time per byte read: 3,372 nS 2016-06-20 20:35:25,241 [Thread-5] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:executeRandomIO(388)) - Effective bandwidth 0.296492 MB/S 2016-06-20 20:35:25,241 [Thread-5] INFO scale.TestS3AInputStreamPerformance (TestS3AInputStreamPerformance.java:logStreamStatistics(292)) - Stream Statistics StreamStatistics{OpenOperations=4, CloseOperations=4, Closed=0, Aborted=4, SeekOperations=2, ReadExceptions=0, ForwardSeekOperations=0, BackwardSeekOperations=2, BytesSkippedOnSeek=0, BytesBackwardsOnSeek=6356992, BytesRead=1376256, BytesRead excluding skipped=1376256, ReadOperations=158, ReadFullyOperations=4, ReadsIncomplete=154, SeekRangeOvershot=0, BytesReadInClose=0, BytesDiscardedInAbort=80771308} 2016-06-20 20:35:25,241 [Thread-5] INFO scale.S3AScaleTestBase (S3AScaleTestBase.java:describe(155)) - {code} > S3a: Consider reducing the number of connection aborts by setting correct > length in s3 request > ---------------------------------------------------------------------------------------------- > > Key: HADOOP-13203 > URL: https://issues.apache.org/jira/browse/HADOOP-13203 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.0 > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Attachments: HADOOP-13203-branch-2-001.patch, > HADOOP-13203-branch-2-002.patch, HADOOP-13203-branch-2-003.patch, > HADOOP-13203-branch-2-004.patch, HADOOP-13203-branch-2-005.patch, > HADOOP-13203-branch-2-006.patch, HADOOP-13203-branch-2-007.patch, > stream_stats.tar.gz > > > Currently file's "contentLength" is set as the "requestedStreamLen", when > invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the > stream had to be closed and reopened. But lots of times the stream was closed > with abort() causing the internal http connection to be unusable. This incurs > lots of connection establishment cost in some jobs. It would be good to set > the correct value for the stream length to avoid connection aborts. > I will post the patch once aws tests passes in my machine. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org