dfs -getMerge does not do what it says it does
----------------------------------------------

                 Key: HADOOP-2120
                 URL: https://issues.apache.org/jira/browse/HADOOP-2120
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.14.3
         Environment: All
            Reporter: Milind Bhandarkar
             Fix For: 0.16.0


dfs -getMerge, which calls FileUtil.CopyMerge, contains this javadoc:

{code}
Get all the files in the directories that match the source file pattern
   * and merge and sort them to only one file on local fs 
   * srcf is kept.
{code}

However, it only concatenates the set of input files, rather than merging them 
in sorted order.

Ideally, the copyMerge should be equivalent to a map-reduce job with 
IdentityMapper and IdentityReducer with numReducers = 1. However, not having to 
run this as a map-reduce job has some advantages, since it increases cluster 
utilization during reduce phase.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to