[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274641#comment-15274641
]
Colin Patrick McCabe commented on HADOOP-13028:
-----------------------------------------------
{code}
926 <property>
927 <name>fs.s3a.readahead.range</name>
928 <value>65536</value>
929 <description>Bytes to read ahead during a seek() before closing and
930 re-opening the S3 HTTP connection.</description>
931 </property>
{code}
Hmm, should this be {{fs.s3a.readahead.default}}? It seems like this is the
default if the user doesn't call {{FSDataInputStream#setReadahead}},
{{S3AInputStream#closed}}: it seems like this should be an {{AtomicBoolean}}.
Otherwise two threads could both enter this code block, right?
{code}
362 if (!closed) {
363 closed = true;
364 super.close();
365 closeStream("close() operation", this.contentLength);
366 streamStatistics.close();
367 }
{code}
{code}
public S3AInstrumentation.InputStreamStatistics getStreamStatistics() {
{code}
Maybe should be called {{getS3StreamStatistics}}, reflecting the fact that this
API is s3-specific?
Is it really necessary to put statistics information into the {{toString}}
methods of the streams? It seems like this could lead to compatibility woes,
and we have the API described above to provide this information anyway.
> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch,
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch,
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch,
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
> HADOOP-13028-branch-2-010.patch,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]