Johannes Schwenk created PIG-2722: ------------------------------------- Summary: UDF FilterFunc in expression using OR right hand side gets ignored Key: PIG-2722 URL: https://issues.apache.org/jira/browse/PIG-2722 Project: Pig Issue Type: Bug Affects Versions: 0.8.1 Environment: pig-0.8.1, hadoop-0.20.2 from Clouderas distribution cdh3u3 on Kubuntu 12.04 64Bit. Reporter: Johannes Schwenk
The following pig script does not produce the expected output: {noformat} register adition.jar a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, additional:int, referer:chararray); b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') OR com.adition.pig.filtering.string.CONTAINS(referer, 'praesident'); EXPLAIN b; dump b; {noformat} TestCONTAINS-testFilteringCluster-input.txt contains {noformat} 1 23 42 http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers 2 123 42 http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers 3 223 142 http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca 4 323 242 http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama 5 423 342 http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama 6 523 442 http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident {noformat} The {{adition.jar}} has been built against the cloudera cdh3u3 distribution and contains the filter function {{CONTAINS}}, see here http://pastebin.com/Uwje7v1V . The output can be seen here http://pastebin.com/yXY17mXx . Essentially what is happening is that the right hand side of the OR in the FILTER expression is beeing ignored, resulting in the script returning just two lines {noformat} (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama) {noformat} instead of three lines {noformat} (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama) (6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident) {noformat} Running the script with pig 0.11.0 yields correct results http://pastebin.com/Cr5CkHui See also the diskussion on the pig-user mailinglist http://www.mail-archive.com/user%40pig.apache.org/msg05278.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira