[ 
https://issues.apache.org/jira/browse/NUTCH-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18063316#comment-18063316
 ] 

Lewis John McGibbney edited comment on NUTCH-3162 at 3/5/26 7:40 PM:
---------------------------------------------------------------------

Fantastic [~snagel] thank you for this. The issues/bugs you raise relate to 
Nutch currently having no validation that metrics are reasonable (e.g., 
negative values, impossible ratios, etc.). The impact is that we can experience 
silent failures, incorrect monitoring, etc.

The solution is to add metrics validation in Nutch and alerting thresholds 
further downstream.

I'll get to work on the fixes soon.


was (Author: lewismc):
Fantastic [~snagel] thank you for this. I'll get to work on the fixes soon.

> Latency metrics to properly merge data from all threads and tasks
> -----------------------------------------------------------------
>
>                 Key: NUTCH-3162
>                 URL: https://issues.apache.org/jira/browse/NUTCH-3162
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, indexer, parser
>    Affects Versions: 1.22
>            Reporter: Sebastian Nagel
>            Priority: Major
>             Fix For: 1.23
>
>
> The latency metrics (NUTCH-3134) have to issues:
> 1. Only the data from one thread is used, in case, a tool is multi-threaded. 
> That's definitely the case for Fetcher. The "emitCounters" methods needs to 
> increment the counter values, instead of calling "setValue". However, this is 
> not the correct approach for the percentiles, see also next point.
> 2. If running full cluster mode with multiple parallel tasks, the task 
> counters are summed up to the job counter value. However, the values of the 
> latency percentiles then turn out to be too high.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to