[
https://issues.apache.org/jira/browse/PIG-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Koji Noguchi updated PIG-3492:
------------------------------
Attachment: pig-3492-trunk_04.patch
bq. compilePp in dumpSchema makes couple of tests fail returning empty schema.
Thanks Rohini. I guess my shortcut didn't work for some tests... For 0.11, we
might want to take out compilePp hack and keep the rest. That would at least
make 0.11 as good(bad) as 0.10 in terms of the bugs we're seeing. ('describe'
bug would remain but LOGenerate/LOJoin bug fixed)
Now, for the 0.12 and long term, Taking the latter approach from my previous
comment.
bq. (i-2) We can either fix it by forcing compilePp() before describe or moving
ImplicitSplitInserter/DuplicateForEachColumnRewrite to PigServer.compile().
ImplicitSplitInserter/DuplicateForEachColumnRewrite seem to be an essential
part of compilation for correctness and they aren't really an optimization.
With that assumption, I rewrote them as Visitors and moved them from
LogicalPlanOptimizer to PigServer.compile.
With this change, 38 unit tests started failing.
* 5 tests failing with NullPointerException in Illustrate. I believe this was
due to a bug in pig/pen/LineageTrimmingVisitor.java. Added entry of
LOSplitOutput for the fix.
* 1 TestOptimizeLimit failing with typecasting LOLimit to LOForeach. This was
due to change in logical plan having
ImplicitSplitInserter/DuplicateForEachColumnRewrite as default irrespective of
the optimizer the test picks.
* 32 failures at TestMultiQueryCompiler/TestMultiQueryLocal since the tests
were counting the number of Logical Operators. Updated the tests after making
sure increase is only coming from LOSplit and LOSplitOutput.
Trying to get e2e running with this patch.
> ColumnPrune dropping used column due to
> LogicalRelationalOperator.fixDuplicateUids changes not propagating
> ----------------------------------------------------------------------------------------------------------
>
> Key: PIG-3492
> URL: https://issues.apache.org/jira/browse/PIG-3492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1, 0.12.1, 0.13.0
> Reporter: Koji Noguchi
> Attachments: pig-3492-trunk_04.patch, pig-3492-v0.12_01.patch
>
>
> I don't have a testcase I can upload at the moment, but here's my observation.
> SplitFilter -> schemaResetter -> LOGenerate.getSchema ->
> LogicalRelationalOperator.fixDuplicateUids() creating a new UID but that UID
> is not propagated to the entire plan (since SplitFilter.reportChanges only
> returns subplan).
> As a result, I am seeing ColumnPruning cutting off those used columns.
--
This message was sent by Atlassian JIRA
(v6.1#6144)