[
https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881792#comment-17881792
]
Sebastian Nagel commented on NUTCH-3059:
----------------------------------------
Note: the above test was run in pseudo-distributed mode because in local mode
only one partition per segment is generated. The counters are correct, as shown
by comparison with segment counts:
{noformat}
$> nutch readseg -list -dir
NAME GENERATED FETCHER START FETCHER END
FETCHED PARSED
20240914162841 1000 ? ? ? ?
20240914162906 399 ? ? ? ?
{noformat}
> Generator: selector job does not count reduce output records
> ------------------------------------------------------------
>
> Key: NUTCH-3059
> URL: https://issues.apache.org/jira/browse/NUTCH-3059
> Project: Nutch
> Issue Type: Bug
> Components: generator
> Affects Versions: 1.20
> Reporter: Sebastian Nagel
> Priority: Minor
> Fix For: 1.21
>
>
> The selector step (job) of the Generator does not count the reduce output
> records resp. shows the count "0":
> {noformat}
> 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: starting
> 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: selecting
> best-scoring urls due for fetch.
> ...
> Map-Reduce Framework
> Map input records=6
> Map output records=6
> ...
> Combine input records=0
> Combine output records=0
> Reduce input groups=1
> Reduce shuffle bytes=594
> Reduce input records=6
> Reduce output records=0
> Spilled Records=12
> ...
> {noformat}
> Not a big issue but should investigate why this happens. The other counters
> seem to work properly, also the partitioner job shows the reduce output
> records. The issue is observed in local and distributed mode.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)