[ 
https://issues.apache.org/jira/browse/HADOOP-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HADOOP-3150:
--------------------------------

    Attachment: 3150.patch

Attached is a patch. This implements the following:

1) Makes the OutputFormat an abstract class with empty implementations for most 
methods:

{code}
public abstract class OutputFormat<K, V> {

  abstract public RecordWriter<K, V> getRecordWriter(JobConf job,
                                     String name, Progressable progress)
  throws IOException;

  abstract public void checkOutputSpecs(JobConf job) throws IOException;

  void commitJobOutput(JobConf job) throws IOException {  }
  
  void discardJobOutput(JobConf job) throws IOException {  }
  
  public boolean isTaskCommitRequired(JobConf job, String attemptId) {
    return false;
  }

  void setTaskWorkOutput(JobConf job, String attemptId)
  throws IOException {  }

  void createTaskWorkOutput(JobConf job, String attemptId)
  throws IOException {  }

  void createJobWorkOutput(JobConf job) throws IOException {  }
  
  void commitTaskOutput(JobConf job, String attemptId) 
  throws IOException {  }

  void discardTaskOutput(JobConf job, String attemptId) 
  throws IOException {  }
}

{code}

2) Removes the FileOutputFormat dependencies from the Task and other framework 
classes. Instead defines some additional methods in the OutputFormat (though 
they have FileOutputFormat flavor but should be okay since the default 
implementation is empty. This is open for suggestions.).

3) Moves things like saveTaskOutput from Task.java to the FileOutputFormat 
since that used to handle just FileOutputFormat anyway.

4) Adds a blocking RPC call canCommit. This call blocks at the tasktracker's 
end until the tasktracker hears from the JobTracker what this task should do - 
commit/discard the output. The debatable thing here is that we are blocking RPC 
handlers when a task reaches commit-pending state. So the expectation is that 
we'd hear back from the JobTracker pretty soon and anyway the tasktracker can't 
do much (like launching new tasks) before it hears from the JobTracker. Also 
the number of RPC handlers have been increased in the patch. There are ways to 
get around without blocking the RPC handler but this seemed like a simple 
approach and should not be a big deal since we are dealing with very (node) 
local RPCs.

5) A whole lot of changes to do getRecordWriter have been made in the patch to 
do with removal of the "ignored" parameter to the method.

6) The taskcommit queue code has been removed from the JT. 

This patch requires testing and may have some bugs at this point. But, I am 
hoping that it makes to 0.18. So could someone please take a quick look at the 
approach. Thanks!

> Move task file promotion into the task
> --------------------------------------
>
>                 Key: HADOOP-3150
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3150
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Devaraj Das
>             Fix For: 0.18.0
>
>         Attachments: 3150.patch
>
>
> We need to move the task file promotion from the JobTracker to the Task and 
> move it down into the output format.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to