Johannes Schwenk created PIG-2722:
-------------------------------------

             Summary: UDF FilterFunc in expression using OR right hand side 
gets ignored
                 Key: PIG-2722
                 URL: https://issues.apache.org/jira/browse/PIG-2722
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.1
         Environment: pig-0.8.1, hadoop-0.20.2 from Clouderas distribution 
cdh3u3 on Kubuntu 12.04 64Bit. 
            Reporter: Johannes Schwenk


The following pig script does not produce the expected output:


{noformat}
register adition.jar

a = LOAD 'TestCONTAINS-testFilteringCluster-input.txt' AS (id:int, grp:int, 
additional:int, referer:chararray);
b = FILTER a BY com.adition.pig.filtering.string.CONTAINS(referer, 'obama') OR 
com.adition.pig.filtering.string.CONTAINS(referer, 'praesident');

EXPLAIN b;

dump b;
{noformat}

TestCONTAINS-testFilteringCluster-input.txt contains 

{noformat}
1  23 42 
http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=flowers
2  123   42 
http://www.google.com/url&url=http%3A%2F%2Fwww.zeit.de%2Findex.php&q=towers
3  223   142   
http://www.google.com/url&url=http%3A%2F%2Fwww.nix-wie-weg.de&q=mallorca
4  323   242   
http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama
5  423   342   http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama
6  523   442   
http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident
{noformat}

The {{adition.jar}} has been built against the cloudera cdh3u3 distribution
and contains the filter function {{CONTAINS}}, see here 
http://pastebin.com/Uwje7v1V .

The output can be seen here http://pastebin.com/yXY17mXx . Essentially what is 
happening is that the right hand side of the OR in the FILTER expression is 
beeing ignored, resulting in the script returning just two lines 

{noformat}
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
(5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
{noformat}

instead of three lines

{noformat}
(4,323,242,http://www.google.com/url&url=http%3A%2F%2Fwww.tagesschau.de&q=obama)
(5,423,342,http://www.google.com/url&url=http%3A%2F%2Fwww.bild.de&q=obama)
(6,523,442,http://www.google.com/url&url=http%3A%2F%2Fwww.example.com%2Fmypage.htm&q=praesident)
{noformat}

Running the script with pig 0.11.0 yields correct results 
http://pastebin.com/Cr5CkHui

See also the diskussion on the pig-user mailinglist
http://www.mail-archive.com/user%40pig.apache.org/msg05278.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to