[
https://issues.apache.org/jira/browse/PIG-4737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4737:
------------------------------------
Attachment: PIG-4737-fixtestfailures.patch
Had replaced new Result(POStatus.STATUS_EOP, null); with the RESULT_EOP
constant which was used in some places to all places in the initial patch.
POSort.java (nested foreach without secondary key optimization) was modifying
RESULT_EOP 's returnStatus to STATUS_OK leading to OOM errors because of
infinite iteration and adding same value to bag. Patch also has some other
minor changes which I added when investigating the failures.
https://builds.apache.org/job/Pig-trunk-commit/2278/testReport/ -
TestLimitVariable.java
TestExampleGenerator.java
TestGrunt.java
TestSecondarySortMR.java
TestSecondarySortTez.java
TestPruneColumn.java
Other failures existed before this patch
TestEvalPipeline.testLimit
TestBZip.java
TestLoad.java
TestPigTest.java
Except for TestBZip.java, others are due to the newly introduced
StoreFuncDecorator not working with FetchLauncher. Will deal with them in a
separate jira.
> Check and fix clone implementation for all classes extending PhysicalOperator
> -----------------------------------------------------------------------------
>
> Key: PIG-4737
> URL: https://issues.apache.org/jira/browse/PIG-4737
> Project: Pig
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4737-1.patch, PIG-4737-2.patch,
> PIG-4737-fixtestfailures.patch
>
>
> PhysicalOperator.clone() eventually calls Object.clone() which only does
> a shallow copy (javadoc wrongly says deep copy) and this causes issues with
> UnionOptimizer in Tez. Most of the clone is already fixed due to issues found
> earlier, but recently ran into an issue with POStream where after clone same
> reference was retained to binaryOutputQueue and binaryInputQueue and caused
> the script to hang.
> Mostly cloned operators in Union go to different tez vertex plans and the
> issue would not have occurred, but in the particular case due to replicated
> join and with the combination of multi-query and union optimization, both the
> cloned plans of union ended up in the same vertex(one that loads C). That
> single vertex will handle both the replicated joins and streaming in two
> sub-plans of split and store the final result in g.
> {code}
> A = LOAD 'a';
> B = LOAD 'b';
> C = LOAD 'c';
> D = JOIN C by $0, A by $0 using 'replicated';
> E = JOIN C by $0, B by $0 using 'replicated';
> F = UNION D, E;
> G = STREAM F through ....
> STORE G into 'g';
> {code}
> It is good to go through all classes extending PhysicalOperator and check if
> it deep clones objects that are not primitive types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)