[
https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473458
]
David Bowen commented on HADOOP-492:
------------------------------------
This requirement is not an exact match with the Metrics API. A MetricsRecord
has a number of capabilities that aren't relevant here:
* gauges as well as counters
* adding any number of tags to the data to support various ways of
aggregating it
* atomic update of multiple metrics
* removing metrics
So I don't think it makes sense to expose any aspect of the Metrics API here.
We can simply add one method to Reporter:
void incrCounter(String name, long amount);
Behind the scenes, we can automatically send this data to the Metrics API with
appropriate tags, as well as aggregating it into the TaskStatus and JobStatus
objects so that it is accessible via JobClient.
We would have some counters that are maintained by the framework. Currently,
these would be:
shuffle_input_bytes
map_input_records
map_input_bytes
map_output_records
map_output_bytes
reduce_input_records
reduce_output_records
Do we need some sort of counter naming convention to prevent future conflicts
between framework-maintained counters and user-defined counters?
> Global counters
> ---------------
>
> Key: HADOOP-492
> URL: https://issues.apache.org/jira/browse/HADOOP-492
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: arkady borkovsky
> Assigned To: David Bowen
>
> It would be nice to have map / reduce job keep aggregated counts for
> arbitrary events occuring in its tasks -- the numer of records processed, the
> numer of exceptions of a specific type, the number of sentences in passive
> voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to
> the jobtracker (in some implementations such messages are piggy-backed on the
> heartbeats), so that the job tracker stores all the latests values from each
> task and aggregates them on a request. It should also make the aggregated
> values available at the job end. The value for a task would be flushed when
> the task fails.
> #491 and #490 may be related to this one.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.