Mohit Sabharwal created PIG-4565:
------------------------------------

             Summary: Support custom MR partitioners for Spark engine 
                 Key: PIG-4565
                 URL: https://issues.apache.org/jira/browse/PIG-4565
             Project: Pig
          Issue Type: Sub-task
          Components: spark
    Affects Versions: spark-branch
            Reporter: Mohit Sabharwal
            Assignee: Mohit Sabharwal
             Fix For: spark-branch


Shuffle operations like DISTINCT, GROUP, JOIN, CROSS allow custom MR 
partitioners to be specified.

Example:
{code}
B = GROUP A BY $0 PARTITION BY 
org.apache.pig.test.utils.SimpleCustomPartitioner PARALLEL 2;

public class SimpleCustomPartitioner extends Partitioner <PigNullableWritable, 
Writable> { 
     //@Override 
    public int getPartition(PigNullableWritable key, Writable value, int 
numPartitions) { 
        if(key.getValueAsPigType() instanceof Integer) { 
            int ret = (((Integer)key.getValueAsPigType()).intValue() % 
numPartitions); 
            return ret; 
       } 
       else { 
            return (key.hashCode()) % numPartitions; 
        } 
    } 
}
{code}

Since Spark's shuffle APIs takes a different parititioner class 
(org.apache.spark.Partitioner) compared to MapReduce 
(org.apache.hadoop.mapreduce.Partitioner), we need to wrap custom partitioners 
written for MapReduce inside a Spark Partitioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to