[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274014#comment-15274014
 ] 

Steve Loughran commented on HADOOP-13028:
-----------------------------------------

patch 010, addressing chris's previous comments:



1. visibility: done.Added interface scope & stability attributes to all the S3a 
classes. Everything is marked as {{@Private}}; stability either {{@Evolving}} 
for subclasses of standard classes; {{@Unstable}} for all the metrics, and 
{{Stable}} for those classes which are used in the AWS toolkit. 

The {{org.apache.hadoop.fs.s3a.Constants}} is tagged as {{Stable, Evolving}} as 
it is intended for public use and must only add new attributes, not remove or 
change existing key names. 

2. {{initMultipartUploads}} error ignored increment: done. I also renamed the 
operation {{errorIgnored()}} to give it more of a verb-style use.
3. Rename {{catch}} handlers. These are all 404 handling, as far as I can see, 
and really part of the legit workflow. What 
I have done is made sure those handlers log at debug.
4. done in both places.
5. done. Also looked at the others, added it everywhere, removed the 
{{LOG.isDebug}} wrappers for all the low-cost statements.
6. done
7. done. Also moved the blocksize get into a field; print that out too, fail 
init if the block is ever zero or less
8. OK: removed
9. done: setReadahead(null) -> default.
10. that came from the Azure stuff, but as its not used here (no looking at 
rolling window stuff) I've cut it.
11. If I could find a useful gauge, I would use it, so can we keep it in there 
until needed? Or cull it and re-implement as required?
12. done

In {{copyFromLocalFile}} I do the  {{statistics.incrementWriteOps(1)}} ahead of 
calling the transfer. That way, if the call failed, the counter still goes up.

Note that in {{copyFile}} the counter is actually going up twice, as the 
progress listener incs it too. I think we should pull the {{incrementWriteOps}} 
call up ahead of {{transfers.copy}} and remove the one from the progress 
listener.

maybe also in {{createEmptyObject}}


Checkstyle is whining about the inner stats fields not having accessors; I'm 
ignoring that on a point of principle: it's being over fussy.
To keep it happy I went and fixed up line width across much of 
{{S3AFilesystem}}. That'll keep the number down.

Added validation of minimum values to all the various arguments 
(buffer/partition sizes, timeouts, etc), so that
negative values -> {{IllegalArgumentException}}. Really we should add that 
option to {{Configuration}}, so that people don't
forget to add those checks in their own code (and how much of Hadoop doesn't 
...). Any (shipped) option where a too-low value
is fixed to be a minimum is left alone for compatibility. Some of the options 
were checked downstream (e.g some of the socket options);
now the exception raised includes the configuration key at fault.

Fixed all javadoc errors. JDK8 fails on any error, so I've touched some files 
{{S3AFastOutputStream}}, {{S3Credentials}} 
{{org.apache.hadoop.fs.s3.FileSystemStore}} because of the javadoc problem.

Note that the title of this JIRA is getting less accurate, more now: instrument 
S3a, implement readahead, validate input params and wrap up other issues. Those 
other JIRAs contained could be closed as fixed for explicitness.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to