[ https://issues.apache.org/jira/browse/PIG-4646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai resolved PIG-4646. ----------------------------- Resolution: Fixed Assignee: Daniel Dai Fix Version/s: 0.12.0 Thanks for reporting. The script runs right with 0.12.0+. Not sure which patch credit to this. Marked for fixed. > PushUpFilter should not push before nested projection with FILTER operators > --------------------------------------------------------------------------- > > Key: PIG-4646 > URL: https://issues.apache.org/jira/browse/PIG-4646 > Project: Pig > Issue Type: Bug > Affects Versions: 0.11.1 > Reporter: Haishan Liu > Assignee: Daniel Dai > Fix For: 0.12.0 > > > Verified the problem in 0.11.1. In short, filter should not be pushed before > a nested foreach in which another filter operator is present. See the > following minimum example: > {code} > cat data; > (1, {(1000, 'a'), (1001, 'b')}) > (2, {(2000, 'a'), (2001, 'b'), (2002, 'c')}) > A = load 'data' as (id:int, hits:{(score:int, name:chararray)}); > B = foreach A { > filtered = filter hits by score > 2000; > generate id, filtered; > }; > dump B; > (1,{}) > (2,{(2001,'b'),(2002,'c')}) > C = filter B by SIZE(filtered) > 0; > dump C; > (1,{}) > (2,{(2001,'b'),(2002,'c')}) > {code} > The desired result can be achieved with either '-optimizer_off PushUpFilter' > when invoking Pig, or using the following convoluted way: > {code} > C = foreach B generate SIZE(filtered) as size, id, filtered; > D = filter C by size > 0; > E = foreach D generate id, filtered; > dump E; > (2,{(2001,'b'),(2002,'c')}) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)