Lewis John McGibbney created NUTCH-3134:
-------------------------------------------

             Summary: Add latency metrics with percentile support to Fetcher, 
Parser, and Indexer
                 Key: NUTCH-3134
                 URL: https://issues.apache.org/jira/browse/NUTCH-3134
             Project: Nutch
          Issue Type: Sub-task
          Components: fetcher, indexer, parser
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
             Fix For: 1.22


This task involves adding timing metrics to the fetching, parsing and indexing 
jobs. We could likely expand this to other jobs in the future but this is a 
good start. The timing metrics should come with percentile support using 
TDigest ([https://github.com/tdunning/t-digest)] which Nutch already depends 
on. This would enable tracking fetch latency, parse latency, and indexing 
latency with p50/p95/p99 insights exposed via Hadoop counters.

Latency distributions will be useful for:
 * Identifying performance bottlenecks in crawl jobs
 * Tuning fetch/parse/index configurations 
 * Detecting anomalies in processing times



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to