[
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274014#comment-15274014
]
Steve Loughran commented on HADOOP-13028:
-----------------------------------------
patch 010, addressing chris's previous comments:
1. visibility: done.Added interface scope & stability attributes to all the S3a
classes. Everything is marked as {{@Private}}; stability either {{@Evolving}}
for subclasses of standard classes; {{@Unstable}} for all the metrics, and
{{Stable}} for those classes which are used in the AWS toolkit.
The {{org.apache.hadoop.fs.s3a.Constants}} is tagged as {{Stable, Evolving}} as
it is intended for public use and must only add new attributes, not remove or
change existing key names.
2. {{initMultipartUploads}} error ignored increment: done. I also renamed the
operation {{errorIgnored()}} to give it more of a verb-style use.
3. Rename {{catch}} handlers. These are all 404 handling, as far as I can see,
and really part of the legit workflow. What
I have done is made sure those handlers log at debug.
4. done in both places.
5. done. Also looked at the others, added it everywhere, removed the
{{LOG.isDebug}} wrappers for all the low-cost statements.
6. done
7. done. Also moved the blocksize get into a field; print that out too, fail
init if the block is ever zero or less
8. OK: removed
9. done: setReadahead(null) -> default.
10. that came from the Azure stuff, but as its not used here (no looking at
rolling window stuff) I've cut it.
11. If I could find a useful gauge, I would use it, so can we keep it in there
until needed? Or cull it and re-implement as required?
12. done
In {{copyFromLocalFile}} I do the {{statistics.incrementWriteOps(1)}} ahead of
calling the transfer. That way, if the call failed, the counter still goes up.
Note that in {{copyFile}} the counter is actually going up twice, as the
progress listener incs it too. I think we should pull the {{incrementWriteOps}}
call up ahead of {{transfers.copy}} and remove the one from the progress
listener.
maybe also in {{createEmptyObject}}
Checkstyle is whining about the inner stats fields not having accessors; I'm
ignoring that on a point of principle: it's being over fussy.
To keep it happy I went and fixed up line width across much of
{{S3AFilesystem}}. That'll keep the number down.
Added validation of minimum values to all the various arguments
(buffer/partition sizes, timeouts, etc), so that
negative values -> {{IllegalArgumentException}}. Really we should add that
option to {{Configuration}}, so that people don't
forget to add those checks in their own code (and how much of Hadoop doesn't
...). Any (shipped) option where a too-low value
is fixed to be a minimum is left alone for compatibility. Some of the options
were checked downstream (e.g some of the socket options);
now the exception raised includes the configuration key at fault.
Fixed all javadoc errors. JDK8 fails on any error, so I've touched some files
{{S3AFastOutputStream}}, {{S3Credentials}}
{{org.apache.hadoop.fs.s3.FileSystemStore}} because of the javadoc problem.
Note that the title of this JIRA is getting less accurate, more now: instrument
S3a, implement readahead, validate input params and wrap up other issues. Those
other JIRAs contained could be closed as fixed for explicitness.
> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
> Key: HADOOP-13028
> URL: https://issues.apache.org/jira/browse/HADOOP-13028
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3, metrics
> Affects Versions: 2.8.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch,
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch,
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch,
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
> HADOOP-13028-branch-2-010.patch,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive,
> closing connections may be expensive (a sign of a regression).
> S3A FS and individual input streams should have counters of the # of
> open/close/failure+reconnect operations, timers of how long things take. This
> can be used downstream to measure efficiency of the code (how often
> connections are being made), connection reliability, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]