[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254420#comment-15254420
]
Steve Loughran commented on HADOOP-13028:
-----------------------------------------
Output of test run against a test CSV dataset against Amazon US, highlighting
how in the test infrastructure (my laptop) the time to set up a read at a new
offset is ~1s, so forward seeking through reading bytes should be much less
expensive.
{code}
===== TEST OUTPUT FOR o.a.s.cloud.s3.S3aIOSuite: 'SeekReadFully: Cost of seek
and ReadFully' =====
2016-04-22 19:19:20,295 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of stat = 206,270,000 ns
2016-04-22 19:19:20,486 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of open = 190,536,000 ns
2016-04-22 19:19:20,487 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:20,926 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of read() [pos = 0] = 438,378,000 ns
2016-04-22 19:19:20,926 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=1
contentLength=20320279}
2016-04-22 19:19:20,927 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=1}
{streamCloseOperations=0} {streamClosed=0} {streamAborted=0}
{streamSeekOperations=0} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=0} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:20,927 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:20,927 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of seek(256) [pos = 1] = 14,000 ns
2016-04-22 19:19:20,928 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256
contentLength=20320279}
2016-04-22 19:19:20,928 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=1}
{streamCloseOperations=0} {streamClosed=0} {streamAborted=0}
{streamSeekOperations=0} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=0} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:20,928 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:20,929 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of seek(256) [pos = 256] = 8,000 ns
2016-04-22 19:19:20,930 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=256
contentLength=20320279}
2016-04-22 19:19:20,930 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=1}
{streamCloseOperations=0} {streamClosed=0} {streamAborted=0}
{streamSeekOperations=0} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=0} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:20,930 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:20,930 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of seek(EOF-2) [pos = 256] = 11,000 ns
2016-04-22 19:19:20,930 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1 nextReadPos=20320277
contentLength=20320279}
2016-04-22 19:19:20,931 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=1}
{streamCloseOperations=0} {streamClosed=0} {streamAborted=0}
{streamSeekOperations=0} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=0} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:20,931 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:21,913 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of read() [pos = 20320277] = 981,224,000 ns
2016-04-22 19:19:21,913 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20320278
nextReadPos=20320278 contentLength=20320279}
2016-04-22 19:19:21,913 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{2 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=2}
{streamCloseOperations=1} {streamClosed=0} {streamAborted=1}
{streamSeekOperations=0} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=0} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:21,914 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:22,287 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(1, byte[1]) [pos = 20320278] = 373,107,000 ns
2016-04-22 19:19:22,288 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=2 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:22,288 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{3 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=3}
{streamCloseOperations=2} {streamClosed=1} {streamAborted=1}
{streamSeekOperations=1} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=1} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:22,289 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:23,366 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(1, byte[256]) [pos = 20320278] = 1,077,370,000 ns
2016-04-22 19:19:23,367 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=257 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:23,367 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{259 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=4}
{streamCloseOperations=3} {streamClosed=1} {streamAborted=2}
{streamSeekOperations=2} {readExceptions=0} {forwardSeekOperations=0}
{backwardSeekOperations=2} {bytesSkippedOnSeek=0} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:23,367 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:23,368 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(260, byte[256]) [pos = 20320278] = 60,000 ns
2016-04-22 19:19:23,368 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=516 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:23,368 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{518 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=4}
{streamCloseOperations=3} {streamClosed=1} {streamAborted=2}
{streamSeekOperations=3} {readExceptions=0} {forwardSeekOperations=1}
{backwardSeekOperations=2} {bytesSkippedOnSeek=3} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:23,368 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:23,368 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(1024, byte[256]) [pos = 20320278] = 28,000 ns
2016-04-22 19:19:23,369 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1280 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:23,369 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1282 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=4}
{streamCloseOperations=3} {streamClosed=1} {streamAborted=2}
{streamSeekOperations=4} {readExceptions=0} {forwardSeekOperations=2}
{backwardSeekOperations=2} {bytesSkippedOnSeek=511} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:23,369 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:23,370 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(1536, byte[256]) [pos = 20320278] = 36,000 ns
2016-04-22 19:19:23,370 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=1792 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:23,370 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{1794 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=4}
{streamCloseOperations=3} {streamClosed=1} {streamAborted=2}
{streamSeekOperations=5} {readExceptions=0} {forwardSeekOperations=3}
{backwardSeekOperations=2} {bytesSkippedOnSeek=767} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:23,370 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:24,473 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(8192, byte[1024]) [pos = 20320278] = 1,102,690,000 ns
2016-04-22 19:19:24,474 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=9216 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:24,474 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{2818 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=5}
{streamCloseOperations=4} {streamClosed=1} {streamAborted=3}
{streamSeekOperations=5} {readExceptions=0} {forwardSeekOperations=3}
{backwardSeekOperations=2} {bytesSkippedOnSeek=767} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:24,475 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:25,644 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of readFully(9728, byte[1024]) [pos = 20320278] = 1,168,764,000 ns
2016-04-22 19:19:25,644 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:25,645 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{3842 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=6}
{streamCloseOperations=5} {streamClosed=1} {streamAborted=4}
{streamSeekOperations=5} {readExceptions=0} {forwardSeekOperations=3}
{backwardSeekOperations=2} {bytesSkippedOnSeek=767} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:25,645 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:25,645 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of seek(getPos) [pos = 20320278] = 18,000 ns
2016-04-22 19:19:25,646 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=10752 nextReadPos=20320278
contentLength=20320279}
2016-04-22 19:19:25,646 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{3842 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=6}
{streamCloseOperations=5} {streamClosed=1} {streamAborted=4}
{streamSeekOperations=5} {readExceptions=0} {forwardSeekOperations=3}
{backwardSeekOperations=2} {bytesSkippedOnSeek=767} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:25,646 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
2016-04-22 19:19:26,717 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of read() [pos = 20320278] = 1,070,520,000 ns
2016-04-22 19:19:26,718 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AInputStream{s3a://landsat-pds/scene_list.gz pos=20320279
nextReadPos=20320279 contentLength=20320279}
2016-04-22 19:19:26,718 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
S3AFileSystem{uri=s3a://landsat-pds, workingDir=s3a://landsat-pds/user/stevel,
partSize=104857600, enableMultiObjectsDelete=true,
multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='null', statistics
{3843 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0 write ops},
metrics {{Context=S3AFileSystem}
{FileSystemId=9042fe44-6438-4cc5-b3bf-d594dc71e699} {streamOpened=7}
{streamCloseOperations=6} {streamClosed=1} {streamAborted=5}
{streamSeekOperations=5} {readExceptions=0} {forwardSeekOperations=3}
{backwardSeekOperations=2} {bytesSkippedOnSeek=767} {files_created=0}
{files_copied=0} {files_copied_bytes=0} {files_deleted=0}
{directories_created=0} {directories_deleted=0} {ignored_errors=0} }}
2016-04-22 19:19:26,719 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Duration of close() = 70,000 ns
2016-04-22 19:19:26,719 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
Statistics 3843 bytes read, 0 bytes written, 2 read ops, 0 large read ops, 0
write ops
2016-04-22 19:19:26,720 INFO s3.S3aIOSuite (Logging.scala:logInfo(54)) -
===== FINISHED o.a.s.cloud.s3.S3aIOSuite: 'SeekReadFully: Cost of seek and
ReadFully' =====
{code}
> add counter and timer metrics for S3A HTTP & low-level operations
> -----------------------------------------------------------------
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Minor
> Attachments: HADOOP-13028-001.patch
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)