[ 
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3492:
------------------------------

    Attachment: pig-3492-trunk_04.patch

bq. compilePp in dumpSchema makes couple of tests fail returning empty schema.

Thanks Rohini.  I guess my shortcut didn't work for some tests...  For 0.11, we 
might want to take out compilePp hack and keep the rest.  That would at least 
make 0.11 as good(bad) as 0.10 in terms of the bugs we're seeing. ('describe' 
bug would remain but LOGenerate/LOJoin bug fixed)

Now, for the 0.12 and long term,  Taking the latter approach from my previous 
comment.

bq. (i-2) We can either fix it by forcing compilePp() before describe or moving 
ImplicitSplitInserter/DuplicateForEachColumnRewrite to PigServer.compile().

ImplicitSplitInserter/DuplicateForEachColumnRewrite seem to be an essential 
part of compilation for correctness and they aren't really an optimization.   
With that assumption, I rewrote them as Visitors and moved them from 
LogicalPlanOptimizer to PigServer.compile.

With this change, 38 unit tests started failing.
* 5 tests failing with NullPointerException in Illustrate.  I believe this was 
due to a bug in pig/pen/LineageTrimmingVisitor.java.  Added entry of 
LOSplitOutput for the fix.
* 1 TestOptimizeLimit failing with typecasting LOLimit to LOForeach.  This was 
due to change in logical plan having  
ImplicitSplitInserter/DuplicateForEachColumnRewrite as default irrespective of 
the optimizer the test picks.
* 32 failures at TestMultiQueryCompiler/TestMultiQueryLocal since the tests 
were counting the number of Logical Operators.  Updated the tests after making 
sure increase is only coming from LOSplit and LOSplitOutput.

Trying to get e2e running with this patch.


> ColumnPrune dropping used column due to 
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3492
>                 URL: https://issues.apache.org/jira/browse/PIG-3492
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.11.1, 0.12.1, 0.13.0
>            Reporter: Koji Noguchi
>         Attachments: pig-3492-trunk_04.patch, pig-3492-v0.12_01.patch
>
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema -> 
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID 
> is not propagated to the entire plan (since SplitFilter.reportChanges only 
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to