[ 
https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841472#comment-17841472
 ] 

ASF GitHub Bot commented on NUTCH-3043:
---------------------------------------

sebastian-nagel commented on PR #814:
URL: https://github.com/apache/nutch/pull/814#issuecomment-2080634329

   Hi @lewismc:
   - "use parameterized logging": done
   - "augment the [metrics 
documentation](https://cwiki.apache.org/confluence/display/NUTCH/Metrics) once 
this is merged.": will do
   - "we could also [create a test for the 
counters](https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial#MRUnitTutorial-TestingCounters).":
 for now, TestGenerator is not based on MRUNIT. The various 
Generator::generate(...) return the number of generated segments without a way 
to access the counters (they're logged, however). I'd prefer to track this in a 
separate issue, because it would require to many code changes to read the 
counters.




> Generator: count URLs rejected by URL filters
> ---------------------------------------------
>
>                 Key: NUTCH-3043
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3043
>             Project: Nutch
>          Issue Type: Improvement
>          Components: generator
>    Affects Versions: 1.20
>            Reporter: Sebastian Nagel
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.21
>
>
> Generator already counts URLs rejected by the (re)fetch scheduler, by fetch 
> interval or status. It should also count the number of URLs rejected by URL 
> filters.
> See also [Generator 
> metrics|https://cwiki.apache.org/confluence/display/NUTCH/Metrics#Metrics-Generator].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to