[ https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471115#comment-17471115 ]
László Bodor edited comment on NUTCH-2839 at 1/8/22, 12:08 PM: --------------------------------------------------------------- took a look, I think this is something to be implemented on tez side basically, tez returns empty counters as far as I can see the nutch MR job is submitted to the cluster, [YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java] intercepts and handles the rest 1. we call [job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815] 2. getClient() returns the actual ClientProtocol, which tez [YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java] this time 3. YARNRunner calls forward to [ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java] 4. in ClientServiceDelegate, getJobCounters returns an [empty Counters object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54] so this something to be implemented on tez side was (Author: abstractdog): took a look, I think this is something to be implemented on tez side basically, tez returns empty counters as far as I can see we the nutch MR job is submitted to the cluster, [YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java] intercepts and handles the rest 1. we call [job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815] 2. getClient() returns the actual ClientProtocol, which tez [YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java] this time 3. YARNRunner calls forward to [ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java] 4. in ClientServiceDelegate, getJobCounters returns an [empty Counters object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54] so this something to be implemented on tez side > Implement Tez counters in Injector job > -------------------------------------- > > Key: NUTCH-2839 > URL: https://issues.apache.org/jira/browse/NUTCH-2839 > Project: Nutch > Issue Type: Sub-task > Components: injector, tez > Affects Versions: 1.18 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Priority: Major > Fix For: 1.19 > > > When running the Injector job on Tez, counters are not populated. This makes > sense as all existing counters are created using MapReduce framework Context > objects. This presents a major issue however. Counters are a requirement as > they are key to regular inspections of ongoing crawls, finding errors and > debugging. The [org.apache.tez.common.counters > |https://tez.apache.org/releases/0.9.2/tez-api-javadocs/index.html?org/apache/tez/common/counters/package-summary.html] > package may offer a equivalent replacement. This issue will be investigated > in this ticket. -- This message was sent by Atlassian Jira (v8.20.1#820001)