[ 
https://issues.apache.org/jira/browse/NUTCH-3059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17881792#comment-17881792
 ] 

Sebastian Nagel commented on NUTCH-3059:
----------------------------------------

Note: the above test was run in pseudo-distributed mode because in local mode 
only one partition per segment is generated. The counters are correct, as shown 
by comparison with segment counts:
{noformat}
$> nutch readseg -list -dir
NAME            GENERATED       FETCHER START           FETCHER END             
FETCHED PARSED
20240914162841  1000            ?               ?       ?       ?
20240914162906  399             ?               ?       ?       ?
{noformat}

> Generator: selector job does not count reduce output records
> ------------------------------------------------------------
>
>                 Key: NUTCH-3059
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3059
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.20
>            Reporter: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.21
>
>
> The selector step (job) of the Generator does not count the reduce output 
> records resp. shows the count "0":
> {noformat}
> 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: starting
> 2024-06-05 13:57:09,299 INFO o.a.n.c.Generator [main] Generator: selecting 
> best-scoring urls due for fetch.
> ...
>          Map-Reduce Framework
>                 Map input records=6
>                 Map output records=6
>                 ...
>                 Combine input records=0
>                 Combine output records=0
>                 Reduce input groups=1
>                 Reduce shuffle bytes=594
>                 Reduce input records=6
>                 Reduce output records=0
>                 Spilled Records=12
>                 ...
> {noformat}
> Not a big issue but should investigate why this happens. The other counters 
> seem to work properly, also the partitioner job shows the reduce output 
> records. The issue is observed in local and distributed mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to