[ 
https://issues.apache.org/jira/browse/MAPREDUCE-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917821#action_12917821
 ] 

Luke Lu commented on MAPREDUCE-901:
-----------------------------------

The latest patch already handles JobCounter and TaskCounter optimization (with 
the generic FrameworkCounterGroup) transparently. But it doesn't address file 
system counter optimization yet. However using concrete fs enums (hdfs, s3 
etc.) like in the previous patches is too brittle, as the whole mapreduce 
package needs to be recompiled/released for every new implementation of 
distributed filesystem, which defeats the purpose of having a filesystem 
interface, where we can already query for (fs scheme, stats) tuples. 
HADOOP-4188 tried to address the issue but the treatment is incomplete: the 
Task#getFileSystemCounters helper method is package private and quite awkward 
to use: requires explict array indexing, e.g. getFileSystemCounters(scheme)[0] 
to return <SCHEME>_BYTES_READ (e.g. HDFS_BYTES_READ) to use with the 
generic counter interface. This also makes decoupled file system counter 
display name localization impossible.

I propose that we add a file system counter API to the Counters framework. 
Something like:
{code}
Counter getFileSystemCounter(String scheme, FileSystemCounter key);
{code}

where FileSystemCounter is an enum class:
{code}
public enum FileSystemCounter {
  BYTES_READ,
  BYTES_WRITTEN
  // etc.
}
{code}

We can take advantage of this interface to create an efficient file system 
counter group that can be more efficiently stored in memory and serialized 
(say: (<scheme>, vint(BYTES_READ), vint(BYTES_WRITTEN)...) tuples)

Thoughts?

> Move Framework Counters into a TaskMetric structure
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-901
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-901
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Owen O'Malley
>            Assignee: Luke Lu
>         Attachments: 901_1.patch, 901_1.patch, FrameworkCounterGroup.java, 
> MAPREDUCE-901.patch, MAPREDUCE-901.patch, mr-901-trunk-v1.patch
>
>
> I think we should move all of the Counters that the framework updates into a 
> single class called TaskMetrics. TaskMetrics would have specific fields for 
> each of the metrics like input records, input bytes, output records, etc.
> It would both reduce the serialized size of the heartbeats (by shrinking the 
> Counters down to just the user's counters) and decrease the latency for 
> updates to the JobTracker (since Counters are sent at most 1/minute instead 
> of 1/heartbeat).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to