[ 
https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471115#comment-17471115
 ] 

László Bodor edited comment on NUTCH-2839 at 1/8/22, 12:08 PM:
---------------------------------------------------------------

took a look, I think this is something to be implemented on tez side
basically, tez returns empty counters as far as I can see
the nutch MR job is submitted to the cluster, 
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
 intercepts and handles the rest

1. we call 
[job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815]
2. getClient() returns the actual ClientProtocol, which tez 
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
 this time
3. YARNRunner calls forward to 
[ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java]
4. in ClientServiceDelegate, getJobCounters returns an [empty Counters 
object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54]

so this something to be implemented on tez side


was (Author: abstractdog):
took a look, I think this is something to be implemented on tez side
basically, tez returns empty counters as far as I can see
we the nutch MR job is submitted to the cluster, 
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
 intercepts and handles the rest

1. we call 
[job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815]
2. getClient() returns the actual ClientProtocol, which tez 
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
 this time
3. YARNRunner calls forward to 
[ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java]
4. in ClientServiceDelegate, getJobCounters returns an [empty Counters 
object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54]

so this something to be implemented on tez side

> Implement Tez counters in Injector job
> --------------------------------------
>
>                 Key: NUTCH-2839
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2839
>             Project: Nutch
>          Issue Type: Sub-task
>          Components: injector, tez
>    Affects Versions: 1.18
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>
> When running the Injector job on Tez, counters are not populated. This makes 
> sense as all existing counters are created using MapReduce framework Context 
> objects. This presents a major issue however. Counters are a requirement as 
> they are key to regular inspections of ongoing crawls, finding errors and 
> debugging. The [org.apache.tez.common.counters 
> |https://tez.apache.org/releases/0.9.2/tez-api-javadocs/index.html?org/apache/tez/common/counters/package-summary.html]
>  package may offer a equivalent replacement. This issue will be investigated 
> in this ticket.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to