[
https://issues.apache.org/jira/browse/NUTCH-3043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17841472#comment-17841472
]
ASF GitHub Bot commented on NUTCH-3043:
---------------------------------------
sebastian-nagel commented on PR #814:
URL: https://github.com/apache/nutch/pull/814#issuecomment-2080634329
Hi @lewismc:
- "use parameterized logging": done
- "augment the [metrics
documentation](https://cwiki.apache.org/confluence/display/NUTCH/Metrics) once
this is merged.": will do
- "we could also [create a test for the
counters](https://cwiki.apache.org/confluence/display/MRUNIT/MRUnit+Tutorial#MRUnitTutorial-TestingCounters).":
for now, TestGenerator is not based on MRUNIT. The various
Generator::generate(...) return the number of generated segments without a way
to access the counters (they're logged, however). I'd prefer to track this in a
separate issue, because it would require to many code changes to read the
counters.
> Generator: count URLs rejected by URL filters
> ---------------------------------------------
>
> Key: NUTCH-3043
> URL: https://issues.apache.org/jira/browse/NUTCH-3043
> Project: Nutch
> Issue Type: Improvement
> Components: generator
> Affects Versions: 1.20
> Reporter: Sebastian Nagel
> Assignee: Sebastian Nagel
> Priority: Minor
> Fix For: 1.21
>
>
> Generator already counts URLs rejected by the (re)fetch scheduler, by fetch
> interval or status. It should also count the number of URLs rejected by URL
> filters.
> See also [Generator
> metrics|https://cwiki.apache.org/confluence/display/NUTCH/Metrics#Metrics-Generator].
--
This message was sent by Atlassian Jira
(v8.20.10#820010)