[ https://issues.apache.org/jira/browse/PIG-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284239#comment-13284239 ]
Dmitriy V. Ryaboy commented on PIG-2722: ---------------------------------------- Verified that the issue does not exist for 0.9 onwards; at this point my advice to anyone encountering this issue is to upgrade. > UDF FilterFunc in expression using OR right hand side gets ignored > ------------------------------------------------------------------ > > Key: PIG-2722 > URL: https://issues.apache.org/jira/browse/PIG-2722 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.1 > Environment: pig-0.8.1, hadoop-0.20.2 from Clouderas distribution > cdh3u3 on Kubuntu 12.04 64Bit. > Reporter: Johannes Schwenk > > The following pig script does not produce the expected output: > {noformat} > register adition.jar > a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, > additional:int, referer:chararray); > b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') > OR com.adition.pig.filtering.string.CONTAINS(referer, 'praesident'); > EXPLAIN b; > dump b; > {noformat} > TestCONTAINS-testFilteringCluster-input.txt contains > {noformat} > 1 23 42 > http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers > 2 123 42 > http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers > 3 223 142 > http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca > 4 323 242 > http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama > 5 423 342 http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama > 6 523 442 > http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident > {noformat} > The {{adition.jar}} has been built against the cloudera cdh3u3 distribution > and contains the filter function {{CONTAINS}}, see here > http://pastebin.com/Uwje7v1V . > The output can be seen here http://pastebin.com/yXY17mXx . Essentially what > is happening is that the right hand side of the OR in the FILTER expression > is beeing ignored, resulting in the script returning just two lines > {noformat} > (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) > (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama) > {noformat} > instead of three lines > {noformat} > (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama) > (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama) > (6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident) > {noformat} > Running the script with pig 0.11.0 yields correct results > http://pastebin.com/Cr5CkHui > See also the diskussion on the pig-user mailinglist > http://www.mail-archive.com/user%40pig.apache.org/msg05278.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira