[
https://issues.apache.org/jira/browse/NUTCH-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18046082#comment-18046082
]
Lewis John McGibbney commented on NUTCH-3131:
---------------------------------------------
I plan to add more to this issue but I won't be able to get around to it for a
little while.
> Nutch Metrics Refactoring & Enhancements
> ----------------------------------------
>
> Key: NUTCH-3131
> URL: https://issues.apache.org/jira/browse/NUTCH-3131
> Project: Nutch
> Issue Type: Improvement
> Components: metrics
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Priority: Major
> Attachments: nutch_metrics_report.pdf
>
>
> The recent “Hadoop Metrics Analysis and Improvement Suggestions” report
> (attached PDF) identified 88 counter increment operations across 17 files
> with multiple anti-patterns and missing key observability signals
> (inconsistent naming, repeated counter lookups, missing latency/error
> context, cardinality risks, etc.).
> This ticket covers the implementation of all high and medium-priority
> improvements from the report (Phases 1–3). Phase 4 (external export,
> dashboard, testing) will be handled in separate tickets.
> Acceptance criteria for consideration
> * All Hadoop counter group and counter names are defined in a single source
> of truth (new class NutchMetricConstants or NutchMetrics).
> * No hardcoded counter group/name strings remain in the 17 affected files.
> * All frequently used counters (especially in hot paths – Fetcher,
> FetcherThread, Generator, Parser, Indexer) are cached in instance variables
> during setup(Context) / setup() and reused.
> * Latency metrics (fetch, parse, index) are added with proper timing and
> recorded via Hadoop counters (average + count).
> * Error counters include error type/context where feasible (at least
> class-level granularity).
> * Counter naming is fully standardized (camelCase counters, PascalCase
> groups).
> * A lightweight MetricsHelper utility class exists and is used across
> components.
> * Thread-safe accumulation (AtomicLong/TDigest) is consolidated via
> ThreadSafeMetrics or equivalent and flushed correctly to Hadoop counters in
> cleanup().
> * Resource utilization metrics (queue sizes, depths) are added for Fetcher.
> * Basic metrics validation is executed at the end of each job (warn on
> impossible conditions).
> * No regression in existing counter values (verified via existing
> integration tests or new sanity job).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)