[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267729#comment-15267729
]
Chris Nauroth commented on HADOOP-13028:
----------------------------------------
[[email protected]], I've spent more time reading the seek code changes, and
I'm pretty confident that they're correct overall, but I have a few more
comments.
# {{S3AInputStream#closeStream}} has the following log message. The text of
the message indicates that it's logging {{contentLength}}, but really it's
logging {{length}}. I imagine {{length}} is really the more interesting thing
here, and the message text should be changed?
{code}
LOG.debug("Stream {} {}: {}; streamPos={}, nextReadPos={}," +
" contentLength={}",
uri, (shouldAbort ? "aborted":"closed"), reason, pos, nextReadPos,
length);
{code}
# Actually, that makes me realize I am unclear about a change made in
HADOOP-12444. {{S3AInputStream#reopen}} has a stream length calculation that
gets passed into the range request.
{code}
requestedStreamLen = (length < 0) ? this.contentLength :
Math.max(this.contentLength, (CLOSE_THRESHOLD + (targetPos + length)));
...
GetObjectRequest request = new GetObjectRequest(bucket, key)
.withRange(targetPos, requestedStreamLen);
{code}
Please tell me if I'm misunderstanding something, but I believe this
calculation always results in an upper bound on the range that effectively
means "get the whole thing." That {{Math.max}} call guarantees that the value
is always at least {{contentLength}}, which is the whole file length. Is this
a bug in the HADOOP-12444 patch?
# {{InputStreamStatistics#seekBackwards}} accepts {{offset}} as an argument but
doesn't use it. Is there supposed to be another counter for back-skipped
bytes? At the call site within {{S3AInputStream#seekInStream}}, the value it
passes would be negative, so we'd need to be careful of that.
> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch,
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch,
> HADOOP-13028-007.patch, HADOOP-13028-008.patch,
> HADOOP-13028-branch-2-008.patch,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]