[
https://issues.apache.org/jira/browse/PIG-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803582#comment-15803582
]
liyunzhang_intel edited comment on PIG-4858 at 1/6/17 5:07 AM:
---------------------------------------------------------------
[~nkollar]: i guess what you mean is following where I marked "Here apply the
patch from PIG-3417” ? I have updated patch in PIG-5044 and you can view the
whole code in the review board of that patch.
SparkCompiler#getSamplingJob
{code}
private SparkOperator getSamplingJob(POSort sort, SparkOperator
sampleOperator, List<PhysicalPlan>
transformPlans,
int rp,
String udfClassName, String[] udfArgs)
throws PlanException,
VisitorException, ExecException {
addSampleOperatorForSkewedJoin(sampleOperator);
List<Boolean> flat1 = new ArrayList<Boolean>();
List<PhysicalPlan> eps1 = new ArrayList<PhysicalPlan>();
// if transform plans are not specified, project the columns of sorting
keys
if (transformPlans == null) {
......
} else {
for (int i = 0; i < transformPlans.size(); i++) {
eps1.add(transformPlans.get(i));
flat1.add(i == transformPlans.size() - 1 ? true : false);
#Here apply the patch from PIG-3417
}
}
{code}
was (Author: kellyzly):
[~nkollar]: i guess what you mean is following where I marked "Here apply the
patch from PIG-3417” ?
SparkCompiler#getSamplingJob
{code}
private SparkOperator getSamplingJob(POSort sort, SparkOperator
sampleOperator, List<PhysicalPlan>
transformPlans,
int rp,
String udfClassName, String[] udfArgs)
throws PlanException,
VisitorException, ExecException {
addSampleOperatorForSkewedJoin(sampleOperator);
List<Boolean> flat1 = new ArrayList<Boolean>();
List<PhysicalPlan> eps1 = new ArrayList<PhysicalPlan>();
// if transform plans are not specified, project the columns of sorting
keys
if (transformPlans == null) {
......
} else {
for (int i = 0; i < transformPlans.size(); i++) {
eps1.add(transformPlans.get(i));
flat1.add(i == transformPlans.size() - 1 ? true : false);
#Here apply the patch from PIG-3417
}
}
{code}
> Implement Skewed join for spark engine
> --------------------------------------
>
> Key: PIG-4858
> URL: https://issues.apache.org/jira/browse/PIG-4858
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: Xianda Ke
> Fix For: spark-branch
>
> Attachments: PIG-4858.patch, PIG-4858_2.patch, PIG-4858_3.patch,
> SkewedJoinInSparkMode.pdf
>
>
> Now we use regular join to replace skewed join. Need implement it
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)