[ 
https://issues.apache.org/jira/browse/HADOOP-14988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222168#comment-16222168
 ] 

Steve Loughran commented on HADOOP-14988:
-----------------------------------------

As discussed in HADOOP-14973

* I concur with the need to collect client-side statistics from the object 
store clients, especially related to failures and throttling, as that answers 
the question "why are things so slow"
* I also see that classic metric publishing isn't always the right way to do 
it. Sometimes it is: if a specific node is failing the most, that's a node 
problem for cluster management tools to detect and react to. But if its a 
specific job being throttled, that's not an admin problem, that's a job config 
and store-layout problem, which needs to be returned at the job level.

w.r.t Using Hadoop counters for this, it's cute. But these are not "Hadoop 
counters", they are mapreduce counters; you can't have a filesystem in hadoop 
common using or publishing them. Which means an alternative means of publishing 
them is needed.

# Hadoop MR could collect the stats from the output filesystem & uprate them to 
MR counters.. Issue: do you want this per fs, or would it be aggregated across 
all instances of an fs class?
# the stuff could be collected by the committer and propagated back anyway. 
This is what I'm doing in the S3A committers, where I write the stats to 
_SUCCESS. But that's across the entire set of filesystems of a specific schema 
(s3a:// here), not per query, (moot in MR, different in spark)
# Mingliang's per-thread work here is more foundational, as you want all the 
stats for a task.

Overall then, yes: I want the counters, not things lost in logs. But we need to 
have something which is (a) cross-engine and (b) works on multitenant execution 
engines and so tie stats back to specific jobs.



> WASB: Expose WASB status metrics as counters in Hadoop
> ------------------------------------------------------
>
>                 Key: HADOOP-14988
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14988
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> It would be good to expose WASB status metrics (e.g 503) as Hadoop counters. 
> Here is an example from a spark job, where it ends up spending large amount 
> of time in retries. Adding hadoop counters would help in analyzing and tuning 
> long running tasks.
> {noformat}
> 2017-10-23 23:07:20,876 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:20,877 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:07:21,877 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:21,879 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:07:24,070 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:24,073 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: q:: ResponseReceived: threadId=99, Status=503, 
> Elapsed(ms)=3, ETAG=null, contentLength=198, requestMethod=GET
> 2017-10-23 23:07:27,917 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:27,920 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=2, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:07:36,879 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:36,881 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=1, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:07:54,786 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:07:54,789 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=3, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:08:24,790 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> 2017-10-23 23:08:24,794 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: ResponseReceived: 
> threadId=99, Status=503, Elapsed(ms)=4, ETAG=null, contentLength=198, 
> requestMethod=GET
> 2017-10-23 23:08:54,794 DEBUG [Executor task launch worker for task 2463] 
> azure.SelfThrottlingIntercept:  SelfThrottlingIntercept:: SendingRequest:   
> threadId=99, requestType=read , isFirstRequest=false, sleepDuration=0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to