[ https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841472#comment-17841472 ]
ASF GitHub Bot commented on NUTCH-3043: --------------------------------------- sebastian-nagel commented on PR #814: URL: https://github.com/apache/nutch/pull/814#issuecomment-2080634329 Hi @lewismc: - "use parameterized logging": done - "augment the [metrics documentation](https://cwiki.apache.org/confluence/display/NUTCH/Metrics) once this is merged.": will do - "we could also [create a test for the counters](https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial#MRUnitTutorial-TestingCounters).": for now, TestGenerator is not based on MRUNIT. The various Generator::generate(...) return the number of generated segments without a way to access the counters (they're logged, however). I'd prefer to track this in a separate issue, because it would require to many code changes to read the counters. > Generator: count URLs rejected by URL filters > --------------------------------------------- > > Key: NUTCH-3043 > URL: https://issues.apache.org/jira/browse/NUTCH-3043 > Project: Nutch > Issue Type: Improvement > Components: generator > Affects Versions: 1.20 > Reporter: Sebastian Nagel > Assignee: Sebastian Nagel > Priority: Minor > Fix For: 1.21 > > > Generator already counts URLs rejected by the (re)fetch scheduler, by fetch > interval or status. It should also count the number of URLs rejected by URL > filters. > See also [Generator > metrics|https://cwiki.apache.org/confluence/display/NUTCH/Metrics#Metrics-Generator]. -- This message was sent by Atlassian Jira (v8.20.10#820010)