[
https://issues.apache.org/jira/browse/PIG-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14557347#comment-14557347
]
liyunzhang_intel commented on PIG-4565:
---------------------------------------
[~mohitsabharwal]:
can you check PIG-4565.patch? After i use it and compile
following error apears:
/home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/spark/SparkUtil.java:66:
error: cannot find symbol
[javac] public static <T> Seq<T> toScalaSeq(List<T> list) {
> Support custom MR partitioners for Spark engine
> ------------------------------------------------
>
> Key: PIG-4565
> URL: https://issues.apache.org/jira/browse/PIG-4565
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Affects Versions: spark-branch
> Reporter: Mohit Sabharwal
> Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4565.patch
>
>
> Shuffle operations like DISTINCT, GROUP, JOIN, CROSS allow custom MR
> partitioners to be specified.
> Example:
> {code}
> B = GROUP A BY $0 PARTITION BY
> org.apache.pig.test.utils.SimpleCustomPartitioner PARALLEL 2;
> public class SimpleCustomPartitioner extends Partitioner
> <PigNullableWritable, Writable> {
> //@Override
> public int getPartition(PigNullableWritable key, Writable value, int
> numPartitions) {
> if(key.getValueAsPigType() instanceof Integer) {
> int ret = (((Integer)key.getValueAsPigType()).intValue() %
> numPartitions);
> return ret;
> }
> else {
> return (key.hashCode()) % numPartitions;
> }
> }
> }
> {code}
> Since Spark's shuffle APIs takes a different parititioner class
> (org.apache.spark.Partitioner) compared to MapReduce
> (org.apache.hadoop.mapreduce.Partitioner), we need to wrap custom
> partitioners written for MapReduce inside a Spark Partitioner.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)