[ 
https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
 ] 

Colin Patrick McCabe edited comment on HADOOP-13028 at 5/11/16 8:39 PM:
------------------------------------------------------------------------

In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the stream.

I guess you could argue that the statistics is part of the stream state, and 
toString is intended to reflect stream state.  But it will result in very long 
output from toString which probably isn't what most existing callers want.  And 
it's not consistent with the way any other hadoop streams work, including other 
s3 ones like s3n.


was (Author: cmccabe):
In the past I've written code for Spark that used reflection to make use of 
APIs that may or may not be present in Hadoop.  HBase often does this as well, 
so that it can use multiple versions of Hadoop.  It seems like this wouldn't be 
a lot of code.  Is that feasible in this case?

I just find the argument that we should overload an existing unrelated API to 
output statistics very off-putting.  It's like saying we should override 
hashCode to output the number of times the user called {{seek()}} on the 
stream.  I also find it concerning that this would be something unique to s3a 
and not present in the toString methods of any other filesystem (including the 
other s3 ones).  It feels like a gross hack.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, 
> HADOOP-13028-004.patch, HADOOP-13028-005.patch, HADOOP-13028-006.patch, 
> HADOOP-13028-007.patch, HADOOP-13028-008.patch, HADOOP-13028-009.patch, 
> HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch, 
> HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt, 
> org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, 
> closing connections may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of 
> open/close/failure+reconnect operations, timers of how long things take. This 
> can be used downstream to measure efficiency of the code (how often 
> connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to