[ https://issues.apache.org/jira/browse/HADOOP-1558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511679 ]
Alejandro Abdelnur commented on HADOOP-1558: -------------------------------------------- Adding answer by email: ----- Suggestions make sense. I was looking at the Task class and it seems too Map/Reduce Task specific so I'll need some help here. It is your intention to run the initialize/commit Tasks in the JT box or it they should run in the slaves? --- Had just another idea that would not require a separate task for initialize/commit, nor running custom code in the JT. A new interface: public interface OutputHandler { public void initialize(JobConf conf) throws IOException; public void commit(JobConf conf) throws IOException; public Path getOutputDirPath(JobConf conf) throws IOException; } Provide 2 implementations of if: 1. FileOutputHandler that does the handling implemented by the patch. 2. NOPOutputhandler that does a no operation. Add to the OutputFormat interface a method: public Class getOutputHandlerClass(); This method must returns the OutputHandler implementation the OutputFormat requires. It must be a Hadoop provided implementation (for now one of the 2 above). The JobConf, upon setting the OutpuFormat class will set an internal property with the declared OutputHandler class. The JobTracker and JobInProgress will use this property to instantiate and run the OutputHandler initialize/commit methods. Thus no custom code in the JT and no need for new Task classes. > changes to OutputFormat to work on temporary directory to enable re-running > crashed jobs (Issue: 1121) > ------------------------------------------------------------------------------------------------------ > > Key: HADOOP-1558 > URL: https://issues.apache.org/jira/browse/HADOOP-1558 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Environment: all > Reporter: Alejandro Abdelnur > Attachments: hadoop-1558-JUN1007-1934.txt > > > Add OutputFormat methods like: > /** Called to initialize output for this job. */ > void initialize(JobConf job) throws IOException; > /** Called to finalize output for this job. */ > void commit(JobConf job) throws IOException; > In the base implemenation for FileSystem output, initialize() might then > create a temporary directory for the job, removing any that already exists, > and commit could rename the temporary output directory to the final name. > The existing checkOutputSpecs() would continue to throw an exception if the > final output already exists. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.