[
https://issues.apache.org/jira/browse/PIG-4644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14654139#comment-14654139
]
Anthony Hsu commented on PIG-4644:
----------------------------------
We're using a custom loader. The (simplified) user script looks something like
this
{code}
a = LOAD 'data' USING CustomLoader();
a = foreach a {
b = foreach foo generate c.d#'e';
generate b;
};
b = filter a by foo is not null;
c = filter a by foo is null;
d = UNION b,c;
dump d;
{code}
The physical plan looks like this:
{code}
#-----------------------------------------------
# Physical Plan:
#-----------------------------------------------
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22
|
|---d: Union[bag] - scope-21
|
|---b: Filter[bag] - scope-12
| | |
| | Not[boolean] - scope-15
| | |
| | |---POIsNull[boolean] - scope-14
| | |
| | |---Project[bag][0] - scope-13
| |
| |---a: Filter[bag] - scope-10
| | |
| | Constant(true) - scope-11
| |
| |---a: Split - scope-9
| |
| |---a: New For Each(false)[bag] - scope-8
| | |
| | RelationToExpressionProject[bag][*] - scope-1
| | |
| | |---foo: New For Each(false)[bag] - scope-7
| | | |
| | | POMapLookUp[chararray] - scope-5
| | | |
| | | |---Project[map][0] - scope-4
| | | |
| | | |---Project[tuple][4] - scope-3
| | |
| | |---Project[bag][0] - scope-2
| |
| |---a: Load(data:CustomLoader) - scope-0
|
|---c: Filter[bag] - scope-18
| |
| POIsNull[boolean] - scope-20
| |
| |---Project[bag][0] - scope-19
|
|---a: Filter[bag] - scope-16
| |
| Constant(true) - scope-17
|
|---a: Split - scope-9
|
|---a: New For Each(false)[bag] - scope-8
| |
| RelationToExpressionProject[bag][*] - scope-1
| |
| |---foo: New For Each(false)[bag] - scope-7
| | |
| | POMapLookUp[chararray] - scope-5
| | |
| | |---Project[map][0] - scope-4
| | |
| | |---Project[tuple][4] - scope-3
| |
| |---Project[bag][0] - scope-2
|
|---a: Load(data:CustomLoader) - scope-0
{code}
and the map reduce plan looks like:
{code}
#--------------------------------------------------
# Map Reduce Plan
#--------------------------------------------------
MapReduce node scope-29
Map Plan
d: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-22
|
|---d: Union[bag] - scope-21
|
|---b: Filter[bag] - scope-12
| | |
| | Not[boolean] - scope-15
| | |
| | |---POIsNull[boolean] - scope-14
| | |
| | |---Project[bag][0] - scope-13
| |
| |---a: New For Each(false)[bag] - scope-44
| | |
| | RelationToExpressionProject[bag][*] - scope-42
| | |
| | |---foo: New For Each(false)[bag] - scope-41
| | | |
| | | POMapLookUp[chararray] - scope-38
| | | |
| | | |---Project[map][0] - scope-40
| | | |
| | | |---Project[tuple][4] - scope-39
| | |
| | |---Project[bag][0] - scope-43
| |
| |---a: Load(data:CustomLoader) - scope-45
|
|---c: Filter[bag] - scope-18
| |
| POIsNull[boolean] - scope-20
| |
| |---Project[bag][0] - scope-19
|
|---a: New For Each(false)[bag] - scope-36
| |
| RelationToExpressionProject[bag][*] - scope-34
| |
| |---foo: New For Each(false)[bag] - scope-33
| | |
| | POMapLookUp[chararray] - scope-30
| | |
| | |---Project[map][0] - scope-32
| | |
| | |---Project[tuple][4] - scope-31
| |
| |---Project[bag][0] - scope-35
|
|---a: Load(data:CustomLoader) - scope-37--------
Global sort: false
----------------
{code}
I haven't been able to reproduce this issue using PigStorage and some sample
data. When I try, though the physical plan looks the same, the MR plan ends up
having two MR jobs instead of one and the issue doesn't surface.
> POProject's implementation of clone seems broken
> ------------------------------------------------
>
> Key: PIG-4644
> URL: https://issues.apache.org/jira/browse/PIG-4644
> Project: Pig
> Issue Type: Bug
> Reporter: Ratandeep Ratti
>
> We are receiving the following exception when using Pig
> {noformat}
> Caused by: java.lang.ClassCastException:
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
> cannot be cast to org.apache.pig.backend.hadoop.executionen\
> gine.physicalLayer.expressionOperators.PORelationToExprProject
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:144)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.PORelationToExprProject.clone(PORelationToExprProject.java:50)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:639)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.clone(POForEach.java:53)
> at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.clone(PhysicalPlan.java:227)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.mergeDiamondMROper(MultiQueryOptimizer.java:298)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visitMROp(MultiQueryOptimizer.java:219)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:273)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:46)
> at
> org.apache.pig.impl.plan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:71)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer.visit(MultiQueryOptimizer.java:94)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:629)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:148)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> {noformat}
> On further investigation it seems that POProject's clone method is
> implemented as
> {noformat}
> @Override
> public POProject clone() throws CloneNotSupportedException {
> ArrayList<Integer> cols = new ArrayList<Integer>(columns.size());
> // Can resuse the same Integer objects, as they are immutable
> for (Integer i : columns) {
> cols.add(i);
> }
> POProject clone = new POProject(new OperatorKey(mKey.scope,
> NodeIdGenerator.getGenerator().getNextNodeId(mKey.scope)),
> requestedParallelism, cols);
> clone.cloneHelper(this);
> clone.overloaded = overloaded;
> clone.startCol = startCol;
> clone.isProjectToEnd = isProjectToEnd;
> clone.resultType = resultType;
> return clone;
> }
> {noformat}
> It uses a constructor to clone POProject (which break the weak rule of object
> cloning)
> In the subclass , PORelationToExprProject implements cloneable as
> {noformat}
> @Override
> public PORelationToExprProject clone() throws CloneNotSupportedException {
> return (PORelationToExprProject) super.clone();
> }
> {noformat}
> As seen from the POProject's implementation of cloneable, super.clone will
> never be of type PORelationToExprProject,
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)