[ 
https://issues.apache.org/jira/browse/HADOOP-492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473458
 ] 

David Bowen commented on HADOOP-492:
------------------------------------


This requirement is not an exact match with the Metrics API.  A MetricsRecord 
has a number of capabilities that aren't relevant here:
   * gauges as well as counters
   * adding any number of tags to the data to support various ways of 
aggregating it
   * atomic update of multiple metrics
   * removing metrics

So I don't think it makes sense to expose any aspect of the Metrics API here.  
We can simply add one method to Reporter:

   void incrCounter(String name, long amount);

Behind the scenes, we can automatically send this data to the Metrics API with 
appropriate tags, as well as aggregating it into the TaskStatus and JobStatus 
objects so that it is accessible via JobClient.

We would have some counters that are maintained by the framework.  Currently, 
these would be:

   shuffle_input_bytes
   map_input_records
   map_input_bytes
   map_output_records
   map_output_bytes
   reduce_input_records
   reduce_output_records

Do we need some sort of counter naming convention to prevent future conflicts 
between framework-maintained counters and user-defined counters?





> Global counters
> ---------------
>
>                 Key: HADOOP-492
>                 URL: https://issues.apache.org/jira/browse/HADOOP-492
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: arkady borkovsky
>         Assigned To: David Bowen
>
> It would be nice to have map / reduce job keep aggregated counts for 
> arbitrary events occuring in its tasks -- the numer of records processed, the 
> numer of exceptions of a specific type, the number of sentences in passive 
> voice, whatever the jobs finds useful.
> This can be implemented by tasks periodically sending <name, value> pairs to 
> the jobtracker (in some implementations such messages are piggy-backed on the 
> heartbeats), so that the job tracker stores all the latests values from each 
> task and aggregates them on a request.  It should also make the aggregated 
> values available at the job end.  The value for a task would be flushed when 
> the task fails.
> #491 and #490 may be related to this one.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to