FileOutputFormat should have a method to create custom files under the 
outputdir with a unique name per task to avoid name collision
------------------------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-3258
                 URL: https://issues.apache.org/jira/browse/HADOOP-3258
             Project: Hadoop Core
          Issue Type: New Feature
          Components: mapred
         Environment: all
            Reporter: Alejandro Abdelnur
            Assignee: Alejandro Abdelnur
             Fix For: 0.18.0
         Attachments: patch-3258.txt

Currently, if a M/R code creates a file, it is the responsibility of the M/R 
code to avoid file name collisions from different tasks.

Hadoop should provide an API that creates unique file names based on the task 
type (map or reduce) and the task ID. Similarly to how output files, 
part-#####, are created.

The proposed patch adds 2 static methods to the {{FileOutputFormat}}

{nofomat}
  public static String getUniqueName(JobConf conf, String name);
  public static Path getPathForCustomFile(JobConf conf, String name);
{nofomat}

The first one adds task type and task ID to the given name.

The second gives a PATH to a file in the working outputdir using a file name 
namespaced by the first method.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to