[ 
https://issues.apache.org/jira/browse/HADOOP-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606476#action_12606476
 ] 

Doug Cutting commented on HADOOP-3598:
--------------------------------------

The checks for the output directory's existence in each OutputFormat 
implementation are now redundant, no?

Also, each implementation now calls both createWorkDir() and 
getWorkOutputPath(), and these are the only calls to either of these methods, 
always as a pair.  Wouldn't it be simpler to combine these in a single method?  
For example, getWorkOutputPath() could be changed to create the directory, 
guarantee its existence, etc.  This would reduce the amount of code duplicated 
in each FileOutputFormat subclass.

> Map-Reduce framework needlessly creates temporary _${taskid} directories for 
> Maps
> ---------------------------------------------------------------------------------
>
>                 Key: HADOOP-3598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3598
>             Project: Hadoop Core
>          Issue Type: Bug
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3598_0_20080619.patch
>
>
> The staging directory for task-outputs (i.e. 
> ${mapred.out.dir}/_temporary/_${taskid}) should only be created when Maps 
> produce output on HDFS, which usually isn't the case. This plays very badly 
> with HDFS quotas and may lead to thousands of temp names in the FS namespace, 
> there-by overhauling the quotas. IAC, it isn't good to needlessly create 
> these directories.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to