[ 
https://issues.apache.org/jira/browse/HADOOP-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253961#comment-15253961
 ] 

Steve Loughran commented on HADOOP-13047:
-----------------------------------------

Rajesh, here's the output of my instrumented test run, showing the stats 
collected. These statistics will be accessible at the stream level, so can be 
used to test whether the extended readahead has actually worked. I also add the 
bytes skipped to the count of bytes read, so you can implicitly work out if a 
stream skipped, though it's a lot trickier than just getting the stats on 
{{bytesSkippedOnSeek}}

{code}
===== TEST OUTPUT FOR o.a.s.cloud.s3.S3aIOSuite: 'SeekReadFully: Cost of seek 
and ReadFully' =====

2016-04-22 14:46:49,509 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of stat = 220,789,000 ns
2016-04-22 14:46:49,704 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of open = 195,051,000 ns
2016-04-22 14:46:49,705 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:50,136 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of read() [pos = 0] = 430,721,000 ns
2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=1 
contentLength=20314850 statistics{streamAborted=0, streamOpened=1, 
streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, 
streamClosed=0, readExceptions=0, forwardSeekOperations=0, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:50,137 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of seek(256) [pos = 1] = 22,000 ns
2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 
contentLength=20314850 statistics{streamAborted=0, streamOpened=1, 
streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, 
streamClosed=0, readExceptions=0, forwardSeekOperations=0, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:50,138 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of seek(256) [pos = 256] = 17,000 ns
2016-04-22 14:46:50,139 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256 
contentLength=20314850 statistics{streamAborted=0, streamOpened=1, 
streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, 
streamClosed=0, readExceptions=0, forwardSeekOperations=0, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:50,140 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:50,141 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of seek(EOF-2) [pos = 256] = 25,000 ns
2016-04-22 14:46:50,141 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=20314848 
contentLength=20314850 statistics{streamAborted=0, streamOpened=1, 
streamCloseOperations=0, backwardSeekOperations=0, streamSeekOperations=0, 
streamClosed=0, readExceptions=0, forwardSeekOperations=0, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:50,142 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:51,069 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of read() [pos = 20314848] = 927,211,000 ns
2016-04-22 14:46:51,070 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20314849 
nextReadPos=20314849 contentLength=20314850 statistics{streamAborted=1, 
streamOpened=2, streamCloseOperations=1, backwardSeekOperations=0, 
streamSeekOperations=1, streamClosed=0, readExceptions=0, 
forwardSeekOperations=1, bytesSkippedOnSeek=0}}
2016-04-22 14:46:51,070 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:51,460 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(1, byte[1]) [pos = 20314849] = 389,682,000 ns
2016-04-22 14:46:51,461 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=2 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=1, streamOpened=3, 
streamCloseOperations=2, backwardSeekOperations=1, streamSeekOperations=2, 
streamClosed=1, readExceptions=0, forwardSeekOperations=1, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:51,461 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:52,583 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(1, byte[256]) [pos = 20314849] = 1,121,839,000 ns
2016-04-22 14:46:52,583 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=257 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=2, streamOpened=4, 
streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=3, 
streamClosed=1, readExceptions=0, forwardSeekOperations=1, 
bytesSkippedOnSeek=0}}
2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(260, byte[256]) [pos = 20314849] = 56,000 ns
2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=516 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=2, streamOpened=4, 
streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=4, 
streamClosed=1, readExceptions=0, forwardSeekOperations=2, 
bytesSkippedOnSeek=3}}
2016-04-22 14:46:52,584 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(1024, byte[256]) [pos = 20314849] = 27,000 ns
2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1280 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=2, streamOpened=4, 
streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=5, 
streamClosed=1, readExceptions=0, forwardSeekOperations=3, 
bytesSkippedOnSeek=511}}
2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:52,585 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(1536, byte[256]) [pos = 20314849] = 25,000 ns
2016-04-22 14:46:52,586 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1792 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=2, streamOpened=4, 
streamCloseOperations=3, backwardSeekOperations=2, streamSeekOperations=6, 
streamClosed=1, readExceptions=0, forwardSeekOperations=4, 
bytesSkippedOnSeek=767}}
2016-04-22 14:46:52,586 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:53,773 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(8192, byte[1024]) [pos = 20314849] = 1,186,885,000 ns
2016-04-22 14:46:53,774 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=9216 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=3, streamOpened=5, 
streamCloseOperations=4, backwardSeekOperations=2, streamSeekOperations=7, 
streamClosed=1, readExceptions=0, forwardSeekOperations=5, 
bytesSkippedOnSeek=767}}
2016-04-22 14:46:53,774 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of readFully(9728, byte[1024]) [pos = 20314849] = 1,116,103,000 ns
2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=4, streamOpened=6, 
streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, 
streamClosed=1, readExceptions=0, forwardSeekOperations=6, 
bytesSkippedOnSeek=767}}
2016-04-22 14:46:54,891 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:54,892 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of seek(getPos) [pos = 20314849] = 18,000 ns
2016-04-22 14:46:54,892 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20314849 
contentLength=20314850 statistics{streamAborted=4, streamOpened=6, 
streamCloseOperations=5, backwardSeekOperations=2, streamSeekOperations=8, 
streamClosed=1, readExceptions=0, forwardSeekOperations=6, 
bytesSkippedOnSeek=767}}
2016-04-22 14:46:54,893 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
2016-04-22 14:46:55,854 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of read() [pos = 20314849] = 960,889,000 ns
2016-04-22 14:46:55,854 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) -   
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20314850 
nextReadPos=20314850 contentLength=20314850 statistics{streamAborted=5, 
streamOpened=7, streamCloseOperations=6, backwardSeekOperations=2, 
streamSeekOperations=9, streamClosed=1, readExceptions=0, 
forwardSeekOperations=7, bytesSkippedOnSeek=767}}
2016-04-22 14:46:55,855 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Duration of close() = 70,000 ns
2016-04-22 14:46:55,862 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 
Statistics 3843 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 
write ops
2016-04-22 14:46:55,863 INFO  s3.S3aIOSuite (Logging.scala:logInfo(54)) - 

{code}

> S3a Forward seek in stream length to be configurable
> ----------------------------------------------------
>
>                 Key: HADOOP-13047
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13047
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>         Attachments: HADOOP-13047.WIP.patch
>
>
> Even with lazy seek, tests can show that sometimes a short-distance forward 
> seek is triggering a close + reopen, because the threshold for the seek is 
> simply available bytes in the inner stream.
> A configurable threshold would allow data to be read and discarded before 
> that seek. This should be beneficial over long-haul networks as the time to 
> set up the TCP channel is high, and TCP-slow-start means that the ramp up of 
> bandwidth is slow. In such deployments, it will better to read forward than 
> re-open, though the exact "best" number will vary with client and endpoint.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to