[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15256884#comment-15256884
]
Steve Loughran commented on HADOOP-13028:
-----------------------------------------
Patch -004
includes HADOOP-13047; forward read range is configurable. Default is 64K;
we'll need tests to work out what is good in different deployments (in-EC2;
remote). For my tests, 640K looks right.
There's a lot of tests for the seek behaviour; seeks with no read to verify
lazy seek, then some seek+read sequences to see how things slow down on
different readahead values.
BTW, the readahead can be set on an open stream via
{{CanSetReadahead.setReadahead(Long)}}; this could enable some code to
dynamically tune things if it really knew what it was doing. I'm using it in
the tests to simplify their setup.
> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch,
> HADOOP-13028-004.patch,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)