[ 
https://issues.apache.org/jira/browse/PIG-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284239#comment-13284239
 ] 

Dmitriy V. Ryaboy commented on PIG-2722:
----------------------------------------

Verified that the issue does not exist for 0.9 onwards; at this point my advice 
to anyone encountering this issue is to upgrade.
                
> UDF FilterFunc in expression using OR right hand side gets ignored
> ------------------------------------------------------------------
>
>                 Key: PIG-2722
>                 URL: https://issues.apache.org/jira/browse/PIG-2722
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>         Environment: pig-0.8.1, hadoop-0.20.2 from Clouderas distribution 
> cdh3u3 on Kubuntu 12.04 64Bit. 
>            Reporter: Johannes Schwenk
>
> The following pig script does not produce the expected output:
> {noformat}
> register adition.jar
> a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, 
> additional:int, referer:chararray);
> b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') 
> OR com.adition.pig.filtering.string.CONTAINS(referer, 'praesident');
> EXPLAIN b;
> dump b;
> {noformat}
> TestCONTAINS-testFilteringCluster-input.txt contains 
> {noformat}
> 1  23 42 
> http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers
> 2  123   42 
> http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers
> 3  223   142   
> http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca
> 4  323   242   
> http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama
> 5  423   342   http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama
> 6  523   442   
> http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident
> {noformat}
> The {{adition.jar}} has been built against the cloudera cdh3u3 distribution
> and contains the filter function {{CONTAINS}}, see here 
> http://pastebin.com/Uwje7v1V .
> The output can be seen here http://pastebin.com/yXY17mXx . Essentially what 
> is happening is that the right hand side of the OR in the FILTER expression 
> is beeing ignored, resulting in the script returning just two lines 
> {noformat}
> (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
> (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
> {noformat}
> instead of three lines
> {noformat}
> (4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
> (5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
> (6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident)
> {noformat}
> Running the script with pig 0.11.0 yields correct results 
> http://pastebin.com/Cr5CkHui
> See also the diskussion on the pig-user mailinglist
> http://www.mail-archive.com/user%40pig.apache.org/msg05278.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to