lewismc opened a new pull request, #906: URL: https://github.com/apache/nutch/pull/906
PR for [NUTCH-3162](https://issues.apache.org/jira/browse/NUTCH-3162) which addresses shortcomings in job-level latency percentiles (p50, p95, p99) for Fetcher, ParseSegment, and Indexer by merging TDigest data from all map tasks and threads and writing counters in a single reducer (or a dedicated merge job for Indexer). It should fix the cases where per-task counters were summed and percentiles were not merged. This patch touches the following jobs * Fetcher: Per-thread latency merged in mapper; single reducer merges TDigests and sets job-level p50/p95/p99. * ParseSegment: * Mapper emits latency digest under `LATENCY_KEY` * Custom partitioner sends `LATENCY_KEY` to partition 0 so one reducer merges all TDigests * Reducer merges and sets correct percentile counters. * Indexer: * Reducer writes TDigest to side output * IndexingJob runs *a new* “Indexer Latency Merge” job which merges reducer sets percentile counters. On merge failure: `LOG.error` and driver-level `ErrorTracker` categorization is only run. I think this fixes the issues. Arguably it is more complex than logging to file and performing some ETL to extract metrics from logs however this solution does stick with convention by keeping metrics within the Hadoop ecosystem. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]

