[
https://issues.apache.org/jira/browse/NUTCH-2839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471115#comment-17471115
]
László Bodor edited comment on NUTCH-2839 at 1/8/22, 12:09 PM:
---------------------------------------------------------------
took a look, I think this is something to be implemented on tez side
basically, tez returns empty counters as far as I can see
the nutch MR job is submitted to the cluster,
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
intercepts and handles the rest
1. we call
[job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815]
in Injector for instance
2. getClient() returns the actual ClientProtocol, which is tez
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
this time
3. YARNRunner calls forward to
[ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java]
4. in ClientServiceDelegate, getJobCounters returns an [empty Counters
object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54]
so this something to be implemented on tez side
was (Author: abstractdog):
took a look, I think this is something to be implemented on tez side
basically, tez returns empty counters as far as I can see
the nutch MR job is submitted to the cluster,
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
intercepts and handles the rest
1. we call
[job.getCounters()|https://github.com/apache/hadoop/blob/f64fda0f00b22793a9c5ea10f9d73ef33fa2b563/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java#L815]
in Injector for instance
2. getClient() returns the actual ClientProtocol, which tez
[YARNRunner|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/YARNRunner.java]
this time
3. YARNRunner calls forward to
[ClientServiceDelegate|https://github.com/apache/tez/blob/master/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java]
4. in ClientServiceDelegate, getJobCounters returns an [empty Counters
object|https://github.com/apache/tez/blob/a6a936dad34397226adcb672f25184169ecbcb71/tez-mapreduce/src/main/java/org/apache/tez/mapreduce/client/ClientServiceDelegate.java#L54]
so this something to be implemented on tez side
> Implement Tez counters in Injector job
> --------------------------------------
>
> Key: NUTCH-2839
> URL: https://issues.apache.org/jira/browse/NUTCH-2839
> Project: Nutch
> Issue Type: Sub-task
> Components: injector, tez
> Affects Versions: 1.18
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Fix For: 1.19
>
>
> When running the Injector job on Tez, counters are not populated. This makes
> sense as all existing counters are created using MapReduce framework Context
> objects. This presents a major issue however. Counters are a requirement as
> they are key to regular inspections of ongoing crawls, finding errors and
> debugging. The [org.apache.tez.common.counters
> |https://tez.apache.org/releases/0.9.2/tez-api-javadocs/index.html?org/apache/tez/common/counters/package-summary.html]
> package may offer a equivalent replacement. This issue will be investigated
> in this ticket.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)