[jira] [Commented] (PIG-4565) Support custom MR partitioners for Spark engine

liyunzhang_intel (JIRA) Sat, 23 May 2015 07:01:26 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557347#comment-14557347
 ]


liyunzhang_intel commented on PIG-4565:
---------------------------------------

[~mohitsabharwal]:
  can you check PIG-4565.patch?  After i use it and compile
 following error apears:
/home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkUtil.java:66:
 error: cannot find symbol
    [javac]     public static <T> Seq<T> toScalaSeq(List<T> list) {

> Support custom MR partitioners for Spark engine 
> ------------------------------------------------
>
>                 Key: PIG-4565
>                 URL: https://issues.apache.org/jira/browse/PIG-4565
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>    Affects Versions: spark-branch
>            Reporter: Mohit Sabharwal
>            Assignee: Mohit Sabharwal
>             Fix For: spark-branch
>
>         Attachments: PIG-4565.patch
>
>
> Shuffle operations like DISTINCT, GROUP, JOIN, CROSS allow custom MR 
> partitioners to be specified.
> Example:
> {code}
> B = GROUP A BY $0 PARTITION BY 
> org.apache.pig.test.utils.SimpleCustomPartitioner PARALLEL 2;
> public class SimpleCustomPartitioner extends Partitioner 
> <PigNullableWritable, Writable> { 
>      //@Override 
>     public int getPartition(PigNullableWritable key, Writable value, int 
> numPartitions) { 
>         if(key.getValueAsPigType() instanceof Integer) { 
>             int ret = (((Integer)key.getValueAsPigType()).intValue() % 
> numPartitions); 
>             return ret; 
>        } 
>        else { 
>             return (key.hashCode()) % numPartitions; 
>         } 
>     } 
> }
> {code}
> Since Spark's shuffle APIs takes a different parititioner class 
> (org.apache.spark.Partitioner) compared to MapReduce 
> (org.apache.hadoop.mapreduce.Partitioner), we need to wrap custom 
> partitioners written for MapReduce inside a Spark Partitioner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4565) Support custom MR partitioners for Spark engine

Reply via email to