[ https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881792#comment-17881792 ]
Sebastian Nagel commented on NUTCH-3059: ---------------------------------------- Note: the above test was run in pseudo-distributed mode because in local mode only one partition per segment is generated. The counters are correct, as shown by comparison with segment counts: {noformat} $> nutch readseg -list -dir NAME GENERATED FETCHER START FETCHER END FETCHED PARSED 20240914162841 1000 ? ? ? ? 20240914162906 399 ? ? ? ? {noformat} > Generator: selector job does not count reduce output records > ------------------------------------------------------------ > > Key: NUTCH-3059 > URL: https://issues.apache.org/jira/browse/NUTCH-3059 > Project: Nutch > Issue Type: Bug > Components: generator > Affects Versions: 1.20 > Reporter: Sebastian Nagel > Priority: Minor > Fix For: 1.21 > > > The selector step (job) of the Generator does not count the reduce output > records resp. shows the count "0": > {noformat} > 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: starting > 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: selecting > best-scoring urls due for fetch. > ... > Map-Reduce Framework > Map input records=6 > Map output records=6 > ... > Combine input records=0 > Combine output records=0 > Reduce input groups=1 > Reduce shuffle bytes=594 > Reduce input records=6 > Reduce output records=0 > Spilled Records=12 > ... > {noformat} > Not a big issue but should investigate why this happens. The other counters > seem to work properly, also the partitioner job shows the reduce output > records. The issue is observed in local and distributed mode. -- This message was sent by Atlassian Jira (v8.20.10#820010)