[ 
https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated PIG-2296:
-----------------------------

    Attachment: er.head

Attaching sample data. The following script:
{code}
set mapred.max.split.size 20000;
set pig.maxCombinedSplitSize 200000;

er = LOAD '/tmp/er.head' AS (en : chararray, er : chararray);
tokenized = FOREACH er GENERATE TOKENIZE(en) AS en, TOKENIZE(er) AS er;
pairs = FOREACH tokenized GENERATE FLATTEN(en) AS en_word, FLATTEN(er) AS 
er_word;
pairs_long = FILTER pairs BY (SIZE(en_word) > 4) AND (SIZE(er_word) > 4);

pairs_l = LIMIT pairs_long 10;
DUMP pairs_l;
{code}
generates output like:
{code}
(bright,-)
(bright,o)
(bright,de)
(bright,pe)
...
{code}

whereas with {{-t All}} it generates:
{code}
(bright,Smith)
(bright,barbia)
(bright,bateau)
(bright,senina)
(bright,Winston)
(bright,aprilie)
...
{code}

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 
> 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to