want input sampler & sorted partitioner
---------------------------------------

                 Key: HADOOP-3019
                 URL: https://issues.apache.org/jira/browse/HADOOP-3019
             Project: Hadoop Core
          Issue Type: New Feature
          Components: mapred
            Reporter: Doug Cutting


The input sampler should generate a small, random sample of the input, saved to 
a file.

The partitioner should read the sample file and partition keys into relatively 
even-sized key-ranges, where the partition numbers correspond to key order.

Note that when the sampler is used for partitioning, the number of samples 
required is proportional to the number of reduce partitions.  10x the intended 
reducer count should give good results.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to