[
https://issues.apache.org/jira/browse/PIG-5207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953426#comment-15953426
]
Adam Szita commented on PIG-5207:
---------------------------------
[~rohini]: in our case the order of the edges in mToEdges (from one node to
others) matters since the other nodes are POProject operators which all have a
certain col set in them. That is mapped to the order that the user specifies
upon using COR function like {{COR(var0col, var1col, var2col)}}. The order is
held by the implementation which is basically an
{{ArrayList<PhysicalOperator>}}.
You can test this with the following use case which is similar to what we have
in this E2E test:
{code}
PhysicalPlan plan = new PhysicalPlan();
//Creating ops
PhysicalOperator proj0 = new POProject(new OperatorKey("scope",0),1,1);
PhysicalOperator proj1 = new POProject(new OperatorKey("scope",1),1,0);
PhysicalOperator proj2 = new POProject(new OperatorKey("scope",2),1,1);
PhysicalOperator proj3 = new POProject(new OperatorKey("scope",3),1,1);
PhysicalOperator proj4 = new POProject(new OperatorKey("scope",4),1,1);
PhysicalOperator proj5 = new POProject(new OperatorKey("scope",5),1,2);
POUserFunc udfOp = new POUserFunc(new OperatorKey("scope",6), 1,
Lists.newArrayList(proj1,proj3,proj5), new
FuncSpec(COR.class.getCanonicalName()));
//Adding and connecting ops
plan.add(proj0);
plan.add(proj1);
plan.connect(proj0, proj1);
plan.add(proj2);
plan.add(proj3);
plan.connect(proj2, proj3);
plan.add(proj4);
plan.add(proj5);
plan.connect(proj4, proj5);
plan.add(udfOp);
plan.connect(proj1, udfOp);
plan.connect(proj3, udfOp);
plan.connect(proj5, udfOp);
PhysicalPlan clonedPlan = plan.clone();
//mToEdges is protected...
Field f = OperatorPlan.class.getDeclaredField("mToEdges");
f.setAccessible(true);
MultiMap originalToEdgesMap = (MultiMap)f.get(plan);
MultiMap clonedToEdgesMap = (MultiMap)f.get(clonedPlan);
System.out.println("Original column order");
for (Object op : originalToEdgesMap.keySet()){
if (op instanceof POUserFunc) {
for (Object entry : (List)(originalToEdgesMap.get(op))){
System.out.println(((POProject)entry).getColumn());
}
}
}
System.out.println("Cloned column order");
for (Object op : clonedToEdgesMap.keySet()){
if (op instanceof POUserFunc) {
for (Object entry : (List)(clonedToEdgesMap.get(op))){
System.out.println(((POProject)entry).getColumn());
}
}
}
{code}
This gives me:
{code}
Original column order
0
1
2
Cloned column order
2
0
1
{code}
The plan constructed in the example is:
{code}
POUserFunc(org.apache.pig.builtin.COR)[tuple] - scope-6
|
|---Project[tuple][0] - scope-1
| |
| |---Project[tuple][1] - scope-0
|
|---Project[tuple][1] - scope-3
| |
| |---Project[tuple][1] - scope-2
|
|---Project[tuple][2] - scope-5
|
|---Project[tuple][1] - scope-4
{code}
> BugFix e2e tests fail on spark
> ------------------------------
>
> Key: PIG-5207
> URL: https://issues.apache.org/jira/browse/PIG-5207
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: Adam Szita
> Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5207.0.patch
>
>
> Observed ClassCastException in BugFix 1 and 2 test cases. The exception is
> thrown from and UDF: COR.Final
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)