[
https://issues.apache.org/jira/browse/PIG-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai resolved PIG-4646.
-----------------------------
Resolution: Fixed
Assignee: Daniel Dai
Fix Version/s: 0.12.0
Thanks for reporting.
The script runs right with 0.12.0+. Not sure which patch credit to this. Marked
for fixed.
> PushUpFilter should not push before nested projection with FILTER operators
> ---------------------------------------------------------------------------
>
> Key: PIG-4646
> URL: https://issues.apache.org/jira/browse/PIG-4646
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.11.1
> Reporter: Haishan Liu
> Assignee: Daniel Dai
> Fix For: 0.12.0
>
>
> Verified the problem in 0.11.1. In short, filter should not be pushed before
> a nested foreach in which another filter operator is present. See the
> following minimum example:
> {code}
> cat data;
> (1, {(1000, 'a'), (1001, 'b')})
> (2, {(2000, 'a'), (2001, 'b'), (2002, 'c')})
> A = load 'data' as (id:int, hits:{(score:int, name:chararray)});
> B = foreach A {
> filtered = filter hits by score > 2000;
> generate id, filtered;
> };
> dump B;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> C = filter B by SIZE(filtered) > 0;
> dump C;
> (1,{})
> (2,{(2001,'b'),(2002,'c')})
> {code}
> The desired result can be achieved with either '-optimizer_off PushUpFilter'
> when invoking Pig, or using the following convoluted way:
> {code}
> C = foreach B generate SIZE(filtered) as size, id, filtered;
> D = filter C by size > 0;
> E = foreach D generate id, filtered;
> dump E;
> (2,{(2001,'b'),(2002,'c')})
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)